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Abstract. We consider equation systems of tire form Xi — fi{Xi, . . . . . . ,X„ — 

f„{Xi, . . . ,Xn) where /i, . . . , /„ are polynomials with positive real coefficients. In vector 
form we denote such an equation system hy X — f{X) and call / a system of positive 
polynomials, short SPP. Equation systems of this kind appear naturally in the analysis 
of stochastic models like stochastic context-free grammars (with numerous applications 
to natural language processing and computational biology), probabilistic programs with 
procedures, web-surfing models with back buttons, and branching processes. The least 
nonnegative solution of an SPP equation X = f{X) is of central interest for these 
models. Etessami and Yannakakis [EY09] have suggested a particular version of Newton's 
method to approximate fif. 

We extend a result of Etessami and Yannakakis and show that Newton's method starting 
at always converges to fif. We obtain lower bounds on the convergence speed of the 
method. For so-called strongly connected SPPs we prove the existence of a threshold 
fc/ G N such that for every i > the {kf -f i)-th iteration of Newton's method has at least 
i valid bits of /i/. The proof yields an explicit bound for kf depending only on syntactic 
parameters of /. We further show that for arbitrary SPP equations Newton's method still 
converges linearly: there exists a threshold kf and an a/ > such that for every i > 
the {kf +af ■ i)-th iteration of Newton's method has at least i valid bits of fif. The proof 
yields an explicit bound for af; the bound is exponential in the number of equations in 
X = f{X), but we also show that it is essentially optimal. The proof does not yield 
any bound for kf, it only proves its existence. Constructing a bound for kf is still an 
open problem. Finally, we also provide a geometric interpretation of Newton's method for 
SPPs. 



1 Introduction 

We eonsider equation systems of the form 



where /i, . . . , /„ are polynomials with positive real coefficients. In vector form we denote such 
an equation system hy X ~ /(X). The vector / of polynomials is called a system of positive 
polynomials, or SPP for short. Figure 1 shows the graph of a 2-dimensional SPP equation system 



Equation systems of this kind appear naturally in the analysis of stochastic context-free 
grammars (with numerous applications to natural language processing [MS99,GJ02] and com- 
putational biology [SBH+94,DEKM98,DE04,KII03]), probabilistic programs with procedures 
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X = f{X). 
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Fig. 1. Graphs of the equations Xi = fi{Xi,X2) and X2 = ^(Xi.Xa) with fi{Xi,X2) = X1X2 + j 
and f2{Xi,X2) = + ^XiX2 + |X| + |. There are two real solutions in R^, the least one is labelled 
with fj,f. 

[EKM04,BKS05,EY09,EY05a,EKM05,EY05b,EY05c], and web-surfing models with back but- 
tons [FKK+00,FKK+01]. More generally, they play an important role in the theory of branch- 
ing processes [Har63,AN72], stochastic processes describing the evolution of a population whose 
individuals can die and reproduce. The probability of extinction of the population is the least 
solution of such a system, a result whose history goes back to [WG74] . 

Since SPPs have positive coefficients, x < x' implies f{x) < f{x') for x,x' G K>oj i-e., the 
functions /i, . . . , /„ are monotonic. This allows us to apply Klccnc's theorem (see for instance 
[Kui97]), and conclude that a feasible system X = f{X), i.e., one having at least one nonnega- 
tive solution, has a smallest solution fif. It follows easily from standard Galois theory that nf 
can be irrational and non-expressible by radicals. The problem of deciding, given an SPP and a 
rational vector v encoded in binary, whether fj,f < v holds, is known to be in PSPACE, and to 
be at least as hard as two relevant problems: SQUARE-ROOT-SUM and PosSLP. SQUARE- 
ROOT-SUM is a well-known problem of computational geometry, whose membership in NP is a 
long standing open question. PosSLP is the problem of deciding, giving a division-free straight- 
line program, whether it produces a positive integer (sec [EY09] for more details). PosSLP has 
been recently shown to play a central role in understanding the Blum-Shub-Smale model of 
computation, where each single arithmetic operation over the reals can be carried out exactly 
and in constant time [ABKPM09]. 

For the practical applications mentioned above the complexity of determining if fif exceeds 
a given bound is less relevant than the complexity of, given i e N, computing i valid bits of fif, 
i.e., computing a vector v such that \nfj — Vj \ / j/i/^ | < 2~* for every 1 < j < n. Given an SPP 
/ and « €: N, deciding whether the first i bits of a component of nf, say /i/i, are 0, remains 
as hard as SQUARE-ROOT-SUM and PosSLP. The reason is that in [EY09] both problems 
are reduced to the following one: given e > and an SPP / for which it is known that either 
fif I = 1 or < e, decide which of the two is the case. So it suffices to take e = 2~*. 

In this paper we study the problem of computing i valid bits in the Blum-Shub-Smale model. 
Since the least fixed point of a feasible SPP / is a solution of F{X) = for F{X) = f{X) — X, 
we can try to apply (the multivariate version of) Newton's method [OR70]: starting at some 
a;(o) g jjn ^T^Q uppercase to denote variables and lowercase to denote values), compute the 
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sequence 

^ik+i) -(F'(a;W))-iF(a;W) 

where F'{X) is the Jacobian matrix of partial derivatives. A first difficuhy is that the method 
might not even be well-defined, because F'{x^'''>) could be singular for some k. However, Etessami 
and Yannakakis have recently studied SPPs derived from probabilistic pushdown automata 
(actually, from an equivalent model called recursive Markov chains) [EY09], and shown that a 
particular version of Newton's method always converges, namely a version which decomposes 
the SPP into strongly connected components (SCCs)^ and applies Newton's method to them 
in a bottom- up fashion. Our first result generalizes Etessami and Yannakakis': the ordinary 
Newton method converges for arbitrary SPPs, provided that they are clean (which can be easily 
achieved) . 

While these results show that Newton's method can be an adequate algorithm for solving 
SPP equations, they provide no information on the number of iterations needed to compute i 
valid bits. To the best of our knowledge (and perhaps surprisingly), the rest of the literature 
does not contain relevant information either: it has not considered SPPs explicitly, and the 
existing results have very limited interest for SPPs, since they do not apply even for very simple 
and relevant SPP cases (see Related work below). In this paper we obtain upper bounds on 
the number of iterations that Newton's method needs to produce i valid bits, first for strongly 
connected and then for arbitrary SPP equations. 

For strongly connected SPP equations X = f{X) wc prove the existence of a threshold kf 
such that for every i > the {kf + i)-th iteration of Newton's method has at least i valid bits 
of /i/. So, loosely speaking, after kf iterations Newton's method is guaranteed to compute at 
least 1 new bit of the solution per iteration; we say that Newton's method converges at least 
linearly with rate 1. Moreover, we show that the threshold kf can be chosen as 

kf = \Amn + 3nmax{0, — log ^rn,in}^ 

where n is the number of polynomials of the strongly connected SPP, m is such that all coeffi- 
cients of the SPP can be given as ratios of m-bit integers, and is the minimal component 
of the least fixed point nf. 

Notice that kf depends on fif, which is what Newton's method should compute. For this 
reason we also obtain bounds on kf depending only on m and n. We show that for arbitrary 
strongly connected SPP equations kf = 4mn2" is also a valid threshold. For SPP equations 
coming from stochastic models, such as the ones listed above, we do far better. First, we show 
that if every procedure has a non-zero probability of terminating (a condition that always holds 
for back-button processes [FKK'^00,FKK^"01]), then a valid threshold is kf ~ 2m{n+ 1). Since 
one iteration requires 0{n^) arithmetic operations in a system of n equations, we immediately 
obtain an upper bound on the time complexity of Newton's method in the Blum-Shub-Smale 
model: for back-button processes, i valid bits can be computed in time 0(mn'^ + in^). Second, 
we observe that, since a;'^*') < a;'^'^"'"^'' < fj,f holds for every fc > 0, as Newton's method proceeds 
it provides better and better lower bounds for ^min and thus for kf. We exhibit an SPP for 
which, using this fact and our theorem, we can prove that no component of the solution reaches 
the value 1. This cannot be proved by just computing more iterations, no matter how many. 

For general SPP equations, not necessarily strongly connected, we show that Newton's 
method still converges linearly. Formally, wc show the existence of a threshold kf and a real 
number < a/ such that for every i > the {kf + af i)-th iteration of Newton's method has at 
least i valid bits of fif. So, loosely speaking, after the first kf iterations Newton's method com- 
putes new bits of at a rate of at least 1/a/ bits per iteration. Unlike the strongly connected 

^ Loosely speaking, a subset of variables and their associated equations form an SCC, if the value of 
any variable in the subset influences the value of all variables in the subset, see § 2 for details. 
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case, the proof does not provide any bound on the threshold kf: with respect to the threshold 
the proof is non-constructive, and finding a bound on kf is still an open problem. However, the 
proof does provide a bound for af, it shows af < n • 2" for an SPP with n polynomials. We also 
exhibit a family of SPPs for which more than i ■ 2"~^ iterations are needed to compute i bits. 
So af < n • 2" for every system /, and there exists a family of systems for which 2"^^ ^ ct/- 

Finally, the last result of the paper concerns the geometric interpretation of Newton's method 
for SPP equations. We show that, loosely speaking, the Newton approximants stay within the 
hypervolumc limited by the hypcrsurfaccs corresponding to each individual equation. This means 
that a simple geometric intuition of how Newton's method works, extracted from the case of 
2-dimensional SPPs, is also correct for arbitrary dimensions. As a byproduct we also obtain a 
new variant of Newton's method. 

Related work. There is a large body of literature on the convergence speed of Newton's method 
for arbitrary systems of differentiablc functions. A comprehensive reference is Ortega and Rhcin- 
boldt's book [OR70] (see also Chapter 8 of Ortega's course [Ort72] or Chapter 5 of [Kcl95] for a 
brief summary). Several theorems (for instance Theorem 8.1.10 of [Ort72]) prove that the num- 
ber of valid bits grows linearly, superlinearly, or even exponentially in the number of iterations, 
but only under the hypothesis that F'{x) is non-singular everywhere, in a neighborhood of 
or at least at the point /z/ itself. However, the matrix F'{iif) can be singular for an SPP, even 
for the 1-dimensional SPP f{X) = 1/2X2 _^ ^/2. 

The general case in which F'{fif) may be singular for the solution /i/ that the method 
converges to has been thoroughly studied. In a seminal paper [Red78], Reddien shows that 
under certain conditions, the main ones being that the kernel of F (fif) has dimension 1 and 
that the initial point is close enough to the solution, Newton's method gains 1 bit per iteration. 
Decker and Kelly obtain results for kernels of arbitrary dimension, but they require a certain 
linear map B{X) to be non-singular for all x ^ [DK80]. Griewank observes in [G081] that the 
non-singularity of B{X) is in fact a strong condition which, in particular, can only be satisfied 
by kernels of even dimension. He presents a weaker sufficient condition for linear convergence 
requiring B{X) to be non-singular only at the initial point x^^\ i.e., it only requires to make "the 
right guess" for x^^\ Unfortunately, none of these results can be directly applied to arbitrary 
SPPs. The possible dimensions of the kernel of F'(/z/) for an SPP f{X) are to the best of our 
knowledge unknown, and deciding this question seems as hard as those related to the convergence 
rate'^. Griewank's result does not apply to the decomposed Newton's method either because the 
mapping B{x'^^^) is always singular for x^^^^ = 0. 

Kantorovich's famous theorem (see e.g. Theorem 8.2.6 of [OR70] and [PP80] for an im- 
provement) guarantees global convergence and only requires F' to be non-singular at x^^\ 
However, it also requires to find a Lipschitz constant for F' on a suitable region and some 
other bounds on F' . These latter conditions arc far too restrictive for the applications men- 
tioned above. For instance, the stochastic context-free grammars whose associated SPPs satisfy 
Kantorovich's conditions cannot exhibit two productions X — s- aYZ and W ^ e such that 
Prob{X -> aYZ) ■ Prob{W —!■£)> 1/4. This class of grammars is too contrived to be of use. 

Summarizing, while the convergence of Newton's method for systems of differentiablc func- 
tions has been intensely studied, the case of SPPs does not seem to have been considered yet. 
The results obtained for other classes have very limited applicability to SPPs: cither they do 
not apply at all, or only apply to contrived SPP subclasses. Moreover, these results only provide 
information about the growth rate of the number of accurate bits, but not about the number 
itself. For the class of strongly connected SPPs, our thresholds lead to explicit lower bounds for 
the number of accurate bits depending only on syntactical parameters: the number of equations 

^ More precisely, SPPs with kernels of arbitrary dimension exist, but the cases we know of can be 
trivially reduced to SPPs with kernels of dimension 1. 
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and the size of the coefRcients. For arbitrary SPPs we prove the existence of a threshold, while 
finding explicit lower bounds remains an open problem. 

Structure of the paper. § 2 defines SPPs and briefly describes their applications to stochastic 
systems. § 3 presents a short summary of our main theorems. § 4 proves some fundamental 
properties of Newton's method for SPP equations. § 5 and § 6 contain our results on the con- 
vergence speed for strongly connected and general SPP equations, respectively. § 7 shows that 
the bounds are essentially tight. § 8 presents our results about the geometrical interpretation of 
Newton's method, and § 9 contains conclusions. 

2 Preliminaries 

In this section we introduce our notation used in the following and formalize the concepts 
mentioned in the introduction. 



2.1 Notation 

As usual, R and N denote the set of real, respectively natural numbers. We assume g N. K" 
denotes the set of n-dimensional real valued column vectors and M" o subset of vectors with 
nonnegative components. We use bold letters for vectors, e.g. x G R", where we assume that 
X has the components xi,. . . ,Xn- Similarly, the i-th component of a function / : R" — >■ K" is 
denoted by fi. We define := (0, . . . , 0)^ and 1 := (1, . . . , 1)^ where the superscript ^ indicates 
the transpose of a vector or a matrix. Let ||-|| denote some norm on M". Sometimes we use 
explicitly the maximum norm with ||ic||^ := maxi<i<„ \xi\. 

The partial order < on K." is defined as usual by setting x < y ii Xi < yt for all 1 < i < ti. 
Similarly, x < y ii x < y and x ^ y. Finally, we write x ^ y ii Xi < yi for all 1 < i < n, i.e., if 
every component of x is smaller than the corresponding component of y. 

We use Xi, . . . , X„ as variable identifiers and arrange them into the vector X . In the following 
n always denotes the number of variables, i.e., the dimension of X. While x,y,... denote 
arbitrary elements in M", we write X if we want to emphasize that a function is given w.r.t. 
these variables. Hence, f{X) represents the function itself, whereas f{x) denotes its value for 
some X € R". 

If S" C {1, . . . , n} is a set of components and x a vector, then by xs we mean the vector 
obtained by restricting x to the components in S. 

Let S C {1, . . . , n} and S = {1, . . . , 71} \ 5. Given a function f{X) and a vector xs, then 
f[S/xs] is obtained by replacing, for each s e 5, each occurrence of Xg by Xs and removing 
the s-component. In other words, if f{X) = f{Xs,X-g) then f[S/xs]{y-g) = f-g{xs,y-g). For 
instance, if f{Xi,X2) = + + i)^, then /[{2}/i] ■.R^R,Xi^ ^Xi + \. 

j^mxn (jgj^otes the set of matrices having m rows and n columns. The transpose of a vector 
or matrix is indicated by the superscript ^ . The identity matrix of M"^" is denoted by Id. 

The formal Neumann series of ^ e M"""" is defined by A* = EfeeN^''- I* is well-known 
that A* exists if and only if the spectral radius of A is less than 1, i.e. max{|A| | C 9 
A is an eigenvalue of A} < 1. If A* exists then A* = (Id - A)~^. 

The partial derivative of a function f{X) : M" — > R w.r.t. the variable X.^ is denoted by 
dxif ■ The gradient V/ of f{X) is then defined to be the (row) vector 
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The Jacobian of a function f{X) with / : M" ^ R™ is the matrix f'{X) defined by 

(dxji ... ax„/i\ 

V^Xi frn--- dx„ fm j 

i.e., the i-th row of /' is the gradient of /.;. 
2.2 Systems of Positive Polynomials 

Definition 2.1. ^ Junction f{X) with f : M"q R"q is a system of positive polynomials 
(SPP), if every component fi{X) is a polynomial in the variables Xi,. . . ,Xn with coefficients 
in M>o. We call an SPP f{X) feasible if y = f{y) for some y S R>o. SPP is called linear 
(resp. quadratic^ if all polynomials have degree at most 1 (resp. 2). 

Fact 2.2. Every SPP f is monotone on M>q, i.e. for < x < y we have f{x) < f{y). 

We will need the following lemma, a version of Taylor's theorem. 
Lemma 2.3 (Taylor). Let f be an SPP and x,u>0. Then 

fix) + f'{x)u < fix + u)< fix) + fix + u)u . 

Proof. It suffices to show this for a multivariate polynomial fiX) with nonnegative coefficients. 
Consider git) = fix + tu). We then have 

fix + u)= gil) = giO) + f g'is) ds = fix) + ( fix + su)u ds. 

Jo Jo 

The result follows as fix) < fix + su) < fix + u) for s e [0, 1]. □ 
Since every SPP is continuous, Kleene's fixed-point theorem (see e.g. [Kui97]) applies. 

Theorem 2.4 (Kleene's fixed-point tiieorem). Every feasible SPP f has a least fixed point 
jjif in M"q i.e., fif = fit^f) and, in addition, y = /(y) implies /i/ < y. Moreover, the sequence 

ii^^f^)kefi with ~ /'^(O) (where f'' denotes the k-fold iteration of f) is monotonically 

increasing with respect to < (i.e. < k^'^^^'') and converges to fif. 

In the following we call (/«^'^'')/;gfsj the Kleene sequence of fiX), and drop the subscript 
whenever / is clear from the context. Similarly, we sometimes write fi instead of fif. 

An SPP fix) is clean if for all variables there is a fc e N such that k^''-' > 0. It is easy 
to see that we have kI*^' = for all fc G N if k,!"' = 0. So we can "clean" an SPP fiX) in time 

(n) 

linear in the size of / by determining the components i with ~ and removing them. 
We will also need the notion of dependence between variables. 

Definition 2.5. A polynomial fiX) contains a variable Xi if dxifiX) is not the zero- 
polynomial. 

Definition 2.6. Let fiX) be an SPP. A component i depends directly on a component k if 
fiiX) contains Xk. A component i depends on k if either i depends directly on k or there is 
a component j such that i depends on j and j depends on k. The components {1, . . . ,n} can 
be partitioned into strongly connected components (SCC's) where an SCC S is a maximal set of 
components such that each component in S depends on each other component in S. An SCC 
is called trivial if it consists of a single component that does not depend on itself. An SPP is 
strongly connected (short: an scSPP ) if {I, ... ,n} is a non-trivial SCC. 
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2.3 Convergence Speed 



We will analyze the convergence speed of Newton's niethod. To this end we need the notion of 
valid bits. 

Definition 2.7. Let f be a feasible SPP. A vector x has i valid bits of the least fixed point iif 
if 

for every 1 < j < n. Let (x^''^)k£n be a sequence with < x^''^ < jif. Then the convergence 
order /3 : N — > N o/ the sequence (a;^'^-')^^^ is defined as follows: (3{k) is the greatest natural 
number i such that a;^'^) has i valid bits (or oo if such a greatest number does not exist). We will 
always mean the convergence order of the Newton sequence (i/^'^-')^^^, unless explicitly stated 
otherwise. 

We say that a sequence has linear, exponential, logarithmic, etc. convergence order if the 
function /3(fc) grows linearly, exponentially, or logarithmically in fc, respectively. 

Remark 2. 8. Our definition of convergence order differs from the one commonly used in numer- 
ical analysis (see e.g. [OR70]), where "quadratic convergence" or "Q-quadratic convergence" 
means that the error e' of the new approximant (its distance to the least fixed point according 
to some norm) is bounded by c • , where e is the error of the old approximant and c > is 
some constant. We consider our notion more natural from a computational point of view, since 
it directly relates the number of iterations to the accuracy of the approximation. Notice that 
"quadratic convergence" implies exponential convergence order in the sense of Definition 2.7. In 
the following we avoid the notion of "quadratic convergence" . 

2.4 Stochastic Models 

As mentioned in the introduction, several problems concerning stochastic models can be reduced 
to problems about the least fixed point /i/ of an SPP /. In these cases, /i/ is a vector of 
probabilities, and so /i/ < 1. 

Probabilistic Pushdown Automata Our study of SPPs was initially motivated by the ver- 
ification of probabilistic pushdown automata. A probabilistic pushdown automaton (pPDA) is 
a tuple V ~ {Q, r,d, Prob) where Q is a finite set of control states, is a finite stack al- 
phabet, S Q X r X Q X r* is a, finite transition relation (we write pX ^ qa instead of 
(p, X, q, a) G (5), and Prob is a function which to each transition pX ^ qa assigns its probability 
Prob(pX ^ qa) G (0, 1] so that for all p G Q and AT G -T we have X^pX^qa P™b{pX ^ qa) — 1. 

We write pX ^ qa instead of Prob{pX ^ qa) = x. A configuration of P is a pair qw, where q 
is a control state and w G is a stack content. A pPDA V naturally induces a possibly infinite 
Markov chain with the configurations as states and transitions given by: pXf3 ^ qaf3 for every 
13 € r* iff pX ^ qa. We assume w.l.o.g. that if pX ^ is a transition then [a| < 2. 

pPDAs and the equivalent model of recursive Markov chains have been very thoroughly 
studied [EKM04,BKS05,EY09,EY05a,EKM05,EY05b,EY05c]. This work has shown that the 
key to the analysis of pPDAs are the termination probabilities [pXq], where p and q are states, 
and X is a stack letter, defined as follows (see e.g. [EKM04] for a more formal definition): 
[pXq] is the probability that, starting at the configuration pX, the pPDA eventually reaches the 
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configuration qe (empty stack). It is not difficult to show that the vector of these probabilities 
is the least solution of the SPP equation system containing the equation 

{pXq)= ^■Y.(rYt)-{tZq) + ^ x ■ {rYq) + ^ x 

for each triple (p, X, q) . Call this quadratic SPP the termination SPP of the pPDA (we assume 
that termination SPPs are clean, and it is easy to see that they are always feasible). 

Strict pPDAs and Back-Button Processes A pPDA is strict if for all pX ^ Q x F and all 

q & Q the transition relation contains a pop-rule pX ^ qe for some a; > 0. Essentially, strict 
pPDAs model programs in which every procedure has at least one terminating execution that 
does not call any other procedure. The termination SPP of a strict pPDA satisfies /(O) >- 0. 

In [FKK+00,FKK"'"01] a class of stochastic processes is introduced to model the behavior 
of web-surfers who from the current webpage A can decide either to follow a link to another 
page, say B, with probability £abi or to press the "back button" with nonzero probability 6a- 
These back-button processes correspond to a very special class of strict pPDAs having one single 

control state (which in the following we omit), and rules of the form A '-^ e (press the back 

button from A) or A ' > BA (follow the link from ^ to i?, remembering A as destination of 

pressing the back button a.i B). The termination probabilities are given by an SPP equation 
system containing the equation 

{A) = bA+ ^ab{B){A) = bA + {A) Y ^ab{B) 

A' >BA A' >BA 

for every webpage A. In [FKK+OOjFKK+Ol] those termination probabilities are called revocation 
probabilities. The revocation probability of a page A is the probability that, when currently 
visiting webpage A and having HqHi . . .Hn-iHn as the browser history of previously visited 
pages, then during subsequent surfing from A the random user eventually returns to webpage 
Hn with HqHi . . . Hn-i as the remaining browser history. 

Example 2.9. Consider the following equation system. 

XA / 0.4X2X1-^0.6 \ 
= 0.3X1X2 + 0.4X3X2 + 0.3 

xj V 0.3x1X3 + 0.7 / 

The least solution of the system gives the revocation probabilities of a back-button process with 
three web-pages. For instance, if the surfer is at page 2 it can choose between following links 
to pages 1 and 3 with probabilities 0.3 and 0.4, respectively, or pressing the back button with 
probability 0.3. 

3 Newton's Method and an Overview of Our Results 

In order to approximate the least fixed point /i/ of an SPP / wc employ Newton's method: 

Definition 3.1. Let f be a clean and feasible SPP. The Newton operator Mf is defined as 
follows: 

Affix) := X + (Id - /'(X))-' (/(X) - X) 

The sequence (t'^'°^)fegN with u'^p = A/'^(0) (where Af^ denotes the k-fold iteration of Aff) is 
called Newton sequence. We drop the subscript of Aff and when f is understood. 
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The main results of this paper concern the apphcation of Newton's method to SPPs. We 
smnmarizc them in this section. 

Theorem 4.1 states that the Newton sequence {u^'''')keN is well-defined (i.e., the inverse 
matrices (id — /'(i''-'^'')) exist for every fc S N), monotonically increasing and bounded from 
above by fif (i.e. z/*^''^ < f{v>'--''^) < < jif ), and converges to jjf. This theorem generalizes 

the result of Etessami and Yannakakis in [EY09] to arbitrary clean and feasible SPPs and to 
the ordinary Newton's method. 

For more quantitative results on the convergence speed it is convenient to focus on quadratic 
SPPs. Theorem 4.13 shows that any clean and feasible SPP can be syntactically transformed 
into a quadratic SPP without changing the least fixed point and without accelerating Newton's 
method. This means, one can perform Newton's method on the original (possibly non-quadratic) 
SPP and convergence will be at least as fast as for the corresponding quadratic SPP. 

For quadratic n-dimensional SPPs, one iteration of Newton's method involves 0{n^) arith- 
metical operations and 0{n^) operations in the Blum-Shub-Smale model. Hence, a bound on 
the number of iterations needed to compute a given number of valid bits immediately leads 
to a bound on the number of operations. In § 5 we obtain such bounds for strongly connected 
quadratic SPPs. We give different thresholds for the number of iterations, and show that when 
any of these thresholds is reached, Newton's method gains at least one valid bit for each it- 
eration. More precisely. Theorem 5.12 states the following. Let / be a quadratic, clean and 
feasible scSPP, let fimm and fimax be the minimal and maximal component of respectively, 
and let the coefficients of / be given as ratios of m-bit integers. Then /3{kf +i) > i holds for all 
i G N and for any of the following choices of /c/: 

1. 4TOn + r3nmax{0, - log^„„„}]; 

2. 4TOn2"; 

3. 7mn if / satisfies /(O) >- 0; 

4. 2m(n -I- 1) if / satisfies both /(O) >~ and Umax < 1- 

Wc further show that Newton iteration can also be used to obtain a sequence of upper 
approximations of /if. Those upper approximations converge to fif, asymptotically as fast as 
the Newton sequence. More precisely, Theorem 5.15 states the following: Let / be a quadratic, 
clean and feasible scSPP, let Cmin be the smallest nonzero coefficient of /, and let ^min be the 
minimal component of fif. Further, for all Newton approximants with i/'^*') >- 0, let z/, 
be the smallest coefficient of i/^*^). Then 

u 



(k) 

min 



_ ,,(fe-l)| 



{cm^n -minj:/^^^,!}) 

where [s] denotes the vector x with Xj ~ s for all 1 < j < rt. 

In § 6 wc turn to general (not necessarily strongly connected) clean and feasible SPPs. We 
show in Theorem 6.5 that Newton's method still converges linearly. Formally, the theorem 
proves that for every quadratic, clean and feasible SPP /, there is a threshold fc/ e N and 
a/ > such that /3(fc/ -\- af ■ i) > i for all i £ N. With respect to the threshold our proof is 
purely existential and does not provide any bound for kf. For af we show an upper bound of 
n • 2", i.e., asymptotically at most n ■ 2" extra iterations are needed in order to get one new 
valid bit. § 7 exhibits a family of SPPs in which one new bit requires at least 2"~^ iterations, 
implying that the bound on a/ is essentially tight. 

Finally, § 8 gives a geometrical interpretation of Newton's method on quadratic SPP equa- 
tions. Let R be the region bounded by the coordinate axes and by the quadrics corresponding 
to the individual equations. Theorem 8.10 shows that all Kleene and Newton approximations 
lie within R, i.e.: k*^*' G R for every i G N. 
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4 Fundamental Properties of Newton's Method 



4.1 Effectiveness 

Etcssami and Yannakakis [EY09] suggested to use Newton's method for SPPs. More precisely, 
they showed that the sequence obtained by applying Newton's method to the equation system 
X = f{X) converges to /i/ as long as / is strongly connected. We extend their result to 
arbitrary SPPs, thereby reusing and extending several proofs of [EY09]. 

In Definition 3.1 we defined the Newton operator Aff and the associated Newton sequence 
(i''*''')fceN- In this section we prove the following fundamental theorem on the Newton sequence. 

Theorem 4.1. Let f be a clean and feasible SPP. Let the Newton operator Aff be defined as in 
Definition 3.1: 

Affix) := X + (Id - f{X))-\f{X) - X) 

1. Then the Newton sequence with f '■'^^ = Aff{0) is well-defined (i.e., the matrix 
inverses exist), monotonically increasing, bounded from above by jif (i.e. v^''^ < /(f'-'^^) < 
1^(^+1) < fj,fj, and converges to f-if . 

2. We have (Id - f{u^''^))-^ = f {u'^^^Y for all G N. 
We also have (Id - f [x))^^ = f'{x)* for all x -< /i/. 

The proof of Theorem 4.1 consists of three steps. In the first proof step we study a sequence 
generated by a somewhat weaker version of the Newton operator and obtain the following: 

Proposition 4.2. Let f be a feasible SPP. Let the operator Aff be defined as follows: 

oo 

Affix) := X + ^ {f{Xy\f{X) X)) . 

d=0 

Then the sequence (f '■'^^)fcgN with u'^^^ Aff{0) is monotonically increasing, bounded from 
above by (i.e. i/''^' < /(i''-'^-') < i/C^'+i) < jif) and converges to fif. 

In a second proof step, we show another intermediary proposition, namely that the star of 
the Jacobian matrix /' converges for all Newton approximants: 

Proposition 4.3. Let f be clean and feasible. Then the matrix series 

J'(iy(fe)^ +/'(i/('^))^ + • • • converges in R>o for all Newton approximants u'^^\ i.e., there are no 
oo entries. 

In the third and final step we show that Propositions 4.2 and 4.3 imply Theorem 4.1. 

First Step. For the first proof step (i.e., the proof of Proposition 4.2) we will need the following 
generalization of Taylor's theorem. 

Lemma 4.4. Let f be an SPP, d gN, and < u, and < x < f{x). Then 

f{x + u)>f{x) + r{x)^u. 
In particular, by setting u := f{x) — x we get 

f+\x)-f\x)>f{x)\f{x)-x). 
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Proof. By induction on d. For d = the statement is trivial. Let d > 0. Then, by Taylor's 
theorem (Lemma 2.3), we have: 

f''+\x + u)^f{f'{x + u)) 

> fif^ix) + f'{xfu) (induction hypothesis) 

> f'^'ix) + r{f{x))f\xYu (Lemma 2.3) 
>f+\x) + r{xY+^u {f{x)>x) 

Lemma 4.4 can be used to prove the following. 
Lemma 4.5. Let f he a feasible SPP. Let < a; < fjif and x < f{x). Then 

oo 

x + J2{f'{xf{f{x)-x))<^^f . 



d=0 

Proof. Observe that 



hm f'ix) = ^if (1) 



because < x < fif implies /^(O) < f^{x) < ^xf and and as {f^{^))den converges to /// by 
Theorem 2.4, so does {f^{x))deN- We have: 

oo oo 

x + Y, (fixfifix) -x))<x + Y^ {f^'ix) - fix)) (Lemma 4.4) 

d=0 d=0 

= lim fix) 

d—^co 

= M/ (by (1)) 

Now we can prove Proposition 4.2. 
Proof (of Proposition ^.2). First we prove the following inequality by induction on fc: 

i^(fc) < /(i/C^)) 

The induction base (fc = 0) is easy. For the step, let fc > 0. Then 



□ 



i,(fc+i) ^ + ^ [S'iv^'^'^fifiv^'''^) ~ lyW) 

oo 

= S{v'^^'^) + [S'iv^^^Yifiv^^^) - i/C^)) 



d=0 



emma 2.3) 



d=0 
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Now, the inequality u^'''> < jif follows from Lemma 4.5 by means of a straightforward 
induction proof. Hence, it follows f{i'^'^^) < f{^f) = fJ-f- Further we have 

(2) 



,(fe+i) 



d=0 

So it remains to show that converges to fif. As we have already shown i/^''^ < fif 

it suffices to show that k.^''^ < u'^^^ because (K'^'°))fcgN converges to by Theorem 2.4. We 
proceed by induction on k. The induction base (fc = 0) is easy. For the step, let fc > 0. Then 

< /(i'^'"-') (induction hypothesis) 

< (by (2)) 

This completes the proof of Proposition 4.2 and, hence, the first step towards the proof of 
Theorem 4.1. □ 



Second Step. For the second proof step (i.e., the proof of Proposition 4.3) it is convenient 
to move to the extended reals M[o,oo]i "we extend R>o by an element oo such that addition 
satisfies a + cxD = cxD + a = oo for all a G R>o and multiplication satisfies • cxd = oo • = 
and a • oo = oo • a = oo for all a £ M>o- In IIi[o,oo]: one can rewrite Niy'^^^) = u'^^^ + 
E^o {f\^''''^Y{f{^''^^) - i^'''^)) as /y('=)+/'(iy('^-))*(/(iyW)-i/('=)). Notice that Proposition 4.3 
does not follow trivially from Proposition 4.2, because oo entries of /'(i/^*^))* could be cancelled 
out by matching entries of fiu'^'^^) — u^''\ 

For the proof of Proposition 4.3 we need several lemmata. The following lemma assures that 
a starred matrix has an oo entry if and only if it has an oo entry on the diagonal. 

Lemma 4.6. Let A ~ {o^ij) G M"^". Let A* have an oo entry. Then A* also has an oo entry 
on the diagonal, i.e., [A*]^^ ~ oo for some I < i < n. 

Proof. By induction on n. The base case n = 1 is clear. For n > 1 assume w.l.o.g. that [j4*]^^ = 
oo. We have 

where by ^[2..n,2..n] we mean the square matrix obtained from A by erasing the first row and 
the first column. To see why (3) holds, think of [^*]i„ as the sum of weights of paths from 1 to 
n in the complete graph over the vertices {1, . . . , n}. The weight of a path P is the product of 
the weight of P's edges, and Oi-^i^ is the weight of the edge from ii to 12. Each path P from 1 
to n can be divided into two subpaths Pi, P2 as follows. The second subpath P2 is the suffix of 
P leading from 1 to rt and not returning to 1. The first subpath Pi, possibly empty, is chosen 
such that P = P1P2. Now, the sum of weights of all possible Pi equals and the sum of 

weights of all possible P2 equals X]J=2 "^ij [(^[2..n,2..n])*] j„- So (3) holds. 

As [^*]i„ = 00, it follows that either or some [(^[2..n,2..n])*] j„ equals 00. In the first 

case, we are done. In the second case, by induction, there is an i such that [(^[2..n,2..n])*] jj = 00. 
But then also [A*]^- = 00, because every entry of [(^[2..n,2..n])] is less than or equal to the 
corresponding entry of A* . □ 

The following lemma treats the case that / is strongly connected (cf. [EY09]). 



A 



[2..n,2..n] 



(3) 
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Lemma 4.7. Let f be clean, feasible and non-trivially strongly connected. Let < x < /i/. 
Then f'{x)* does not have oo as an entry. 

Proof. By Theorem 2.4 the Kleene sequence (K;*^*^)igN converges to fif. Furthermore, k^*-' -< fif 
holds for all i, because, as every component depends non-trivially on itself, any increase in any 
component results in an increase of the same component in a later Kleene approximant. So, we 
can choose a Kleene approximant y = k*^*^ such that x < y ^ fif. Notice that y < f{y). By 
monotonicity of /' it suffices to show that f'{y)* does not have oo as an entry. By Lemma 4.4 
(taking x := y and u fif ~ y) we have 

r{vY{i^f-y)<tif-f{y)- 

As d — > oo, the right hand side converges to 0, because, by Kleene's theorem, /''(y) converges 
to jif . So the left hand side also converges to 0. Since /i/ y >~ every entry of f'{yY must 
converge to 0. Then, by standard facts about matrices (see e.g. [LT85]), the spectral radius 
of f'{y) is less than 1, i.e., |A| < 1 for all eigenvalues A of f'{y). This, in turn, implies that the 

series f'{y)* = Id-I- /'(y) + f'{yY H converges in M>o, see [LT85], page 531. In other words, 

f'{y)* and hence f'{x)* do not have oo as an entry. □ 

The following lemma states that Newton's method can only terminate in a component s 
after certain other components £ have reached fif(. 

Lemma 4.8. Let 1 < s, i < n. Let the term [/'(X)*]^^ contain the variable Xg. Let < x < 
f{x) < fif and Xs < fJ^fs o,nd xi < iJ-fg. Then M{x)s < l^-fs- 

Proof. This proof follows closely a proof of [EY09]. Let d> such that [f'{^)'^]sg contains Xi. 
Let in' > such that /™ (x) >- and /™ {x)^ > xg. Such an m' exists because with Kleene's 
theorem the sequence (/'^(a;))fcgN converges to ^f. Notice that our choice of m' guarantees 

'f'ir'ix)r] > [f'ixr]^^. 

J ss 

Now choose m > m' such that /"+Ha;), > /™(a;),. Such an m exists because the sequence 
{f'^{x)s)kGn never reaches nf^. This is because s depends on itself (since [/'(-X')*]^^ is not 
constant 0), and so every increase of the s-component results in an increase of the s-component 
in some later iteration of the Kleene sequence. 

Now we have 

> f'irix)nr+\x) - nx)) (Lemma 4.4) 

>* fixnr+\x) - rix)) 

> f{xYf'{x)"'{f {x) - x) (Lemma 4.4) 
= f{x)<'+'^\f{x)-x). 

The inequality marked with * is strict in the s-component . due to the choice of d and m 
above. So, with b = d + m wc have: 

if+\x) fix)), > {f{x)\f{x) x)), (4) 

Again by Lemma 4.4. inequality (4) holds for all 5 G N, but with > instead of >. Therefore: 

l^fs = {x + Er=oif^'i^) fi^)))s (Kleene) 

>{x + fix)* (fix) - x))^ (inequality (4)) 

= (Mx)), 

□ 
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Now we are ready to prove Proposition 4.3. 

Proof (of Proposition 4-3) ■ Using Lemma 4.6 it is enough to show that [/'(^'^'^■' )*] 7^ oo foi' 
ah s. If the s-component constitutes a trivial SCC, then [/'(i'''^'' )*] = 7^ oo. So we can 
assume in the following that the s-component belongs to a non-trivial SCC, say S. Let be 
the set of variables contained by the term [/'(X)*] . For any t G S we have [/'(X)*] > 
[/'(X)*]^^ [/'(X)*]^^ Neither [/'(X)*] Jnor [fiX)*]^^ is constant zero, because 

S is non-trivial. Therefore, [/'(X)*] ^^ contains all variables that [/'(X)*]^^ contains, and vice 
versa, for all t G S. So, Xl is, for all t G S, exactly the set of variables contained by [/'(X)*]^^. 
We distinguish two cases. 

Case 1: There is a component £ G L such that the sequence {I'g'''' ) ten does not terminate, i.e., 
i^^*"' < /i/^ holds for all k. Then, by Lemma 4.8, the sequence {i'i^^)k£'N cannot reach /i/, either. 
In fact, we have u^g'^ -< fifs- Let M denote the set of those components that the S'-components 
depend on, but do not depend on S. In other words, M contains the components that are "lower" 
in the DAG of SCCs than S. Define g(Xs) := /s(X)[Af/^/^^]. Then g(Xs) is an scSPP with 
1^9 = f-fs- ^^s^ ^ MS'' Lemma 4.7 is applicable, so g'^v'g'^)* does not have cxd as an entry. 
With [f'ii^^'''>)*]ss < g'ii^P)*, we get [f'{v^'''>)*]^^ < 00, as desired. 

Case 2: For all components £ G L the sequence (t'£'^'')feGN terminates. Let i G N be the 
least number such that z/^'' = /.(/^ holds for all £ G L. By Lemma 4.8 we have i^s^^ < 

nf s- But as, according to Proposition 4.2, (vi'^'^ ) ke'fi converges to there must exist a 

j > i such that < — i''--''')) ^ < 00. So there is a component u with 

< (/(i^^^'^) - < 00. This implies < [/'(i^^^'^)*] ^„ < 00, therefore also 

< 00. By monotonicity of /', we have [/'(i^^''')*] < [/'(i^^^'^)*] < 00 for all 
k < j. On the other hand, since [/'(X)*]^^ contains only L-variables and 1/^'°'' = /i/^^ holds for 
aU k > j, we also have [/'(i^^''')*] = [/'(i^^^^)*] 3, < 00 for aU k>j. □ 

This completes the second intermediary step towards the proof of Theorem 4.1. 



Third and Final Step. Now we can use Proposition 4.2 and Proposition 4.3 to complete the 
proof of Theorem 4.1. 

Proof (of Theorem 4-V- By Proposition 4.3 the matrix /'(i/^^))* has no 00 entries. Then we 
clearly have /'(z/('=))*(Id - f'{u^'''>)) = Id, so (Id - f'{v^''^))-'^ = fiv^^^)*, which is the first 
claim of part 2. of the theorem. Hence, we also have 

CXD 

d=0 

= 1^'^"'' + (Id - /'(i^(')))^\/(z^('') - 
= AA(iy('=)) , 

so we can replace Af by Af. Therefore, part 1. of the theorem is implied by Proposition 4.2. It 
remains to show (Id — f'{x))^^ = f'{x)* for all x -< /if. It suffices to show that f'{x)* has 
no 00 entries. By part 1. the sequence (i'^'^^)fcGN converges to . So there is a k' such that 
x < i/^*^ \ By Proposition 4.3, f'iv'^'' ')* has no 00 entries, so, by monotonicity, f'{x)* has no 
00 entries either. □ 
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4.2 Monotonicity 



Lemma 4.9 (Monotonicity of the Newton operator). Let f be a clean and feasible SPP. 
Let < X < y < f(y) < j-tf and let Aff{y) exist. Then 

Mf{x)<Nf{y). 

Proof. For x < y wc have f'{x) < f'{y) as every entry of f'{X) is a monotone polynomial. 
Hence, f{x)* < f'{y)*. With this at hand we get: 

A/>(y) = y + - y) (Theorem 4.1) 

>y + f'ixnf{y)-y) (/'(y)* > /'(x)*) 

>y + f{xy{f{x) + f{x){y -x)-y) (Lemma 2.3) 

= y + fixnifix) -x)- (Id - f{x)){y - x)) 

^y + f{xr{f{x)~x)~{y-x) ifix)*^ 

{Id- fix))-') 

= Affix) (Theorem 4.1) 



□ 

4.3 Exponential Convergence Order in the Nonsingular Case 

If the matrix Id — f'ilJ-f) is nonsingular, Newton's method has exponential convergence order 
in the sense of Definition 2.7. This is, in fact, a well known general property of Newton's 
method, see, e.g.. Theorem 4.4 of [SM03]. For completeness, we show that Newton's method for 
"nonsingular" SPPs has exponential convergence order, see Theorem 4.12 below. 

Lemma 4.10. Let f be a clean and feasible SPP. Let < a; < /i/ such that f'ix)* exists. 
Then there is a bilinear function B : x K" q — > M" q with 

^lf - Nix) < fix)*Bipf -x,^if-x). 

Proof. Write d := jif — x. By Taylor's theorem (cf Lemma 2.3) we obtain 

fix + d)<fix) + fix)d + Bid,d) (5) 

for the bilinear map B(X) := /"(/i/)(X, X), where /"(/i/) denotes the rank-3 tensor of the 
second partial derivatives evaluated at fif [OR70] . We have 

fif-Mix)=d-fix)*ifix)-x) 

= d- f'ix)*id + fix)-ix + d)) 

= d fix)* id + fix) - fix + d)) ix + d^fif^ fipf)) 

<d- fix)* {d - fix)d - Bid, d)) (by (5)) 

= d fix)* ((Id - fix))d - Bid,d)) 

= d d+ fix)* Bid, d) ifix)* = (Id - fix))-') 

= fix)* Bid, d) 

□ 
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Define for tlie following lemmata A^'^'^ := fif — u^''\ i.e., A^'^'^ is the error after k New- 



ton iterations. The following lemma bounds 
nonsingular. 



in terms of 



if Id - f'ifif) is 



Lemma 4.11. Let f be a clean and feasible SPP such that Id — f'{^if) is nonsingular. Then 
there is a constant c > such that 



< c- 



for all fc G N. 



Proof. As Id — f'il^if) is nonsingular, we have, by Theorem 4.1, (Id — f'{x)) ^ = f'{x)* for 
all < X < fif. By continuity, there is a ci > such that ||/'(a;)*|| < ci for all < a; < /i/. 
Similarly, there is a C2 > such that ||i?(a;, a;)|[ < C2 for all < a; < fif, because B is 

2 



bilinear. So it follows from Lemma 4.10 that 



< CiC2 



Lemma 4.11 implies that Newton's method has an exponential convergence order in the 
nonsingular case. More precisely: 

Theorem 4.12. Let f be a clean and feasible SPP such that Id — /'(a*/) is nonsingular. Then 
there is a constant kf £N such that 

Pikf +i)>2' for all i e N. 
Proof. We first show that there is a constant fc^ £ N such that 



< 2^2' for aU i e N. 



(6) 



We can assume w.l.o.g. that c > 1 for the c from Lemma 4.11. As the A'^'^'^ converge to 0, we 



can choose kf e N large enough such that d := — log 
to show the following inequality: 



— logc > 1. As c, d > 1, it suffices 



^(kf + i) 



< 



We proceed by induction on i. For 1 = 0, the inequality above follows from the definition of d. 
Let i > 0. Then 



< c- 

< C- 



^(kf+i) 
2-d-2'-2 

-d-2' + i 



(Lemma 4.11) 
(induction hypothesis) 



Hence, (6) is proved. 

Choose m 6 N large enough such that 2'"+* - 



-log(/i/ ■) > 2* holds for all components j. Thus 



2-(2"+'+log(p/^.)) 



(by (6)) 



< 2 ^ (choice of m) . 

So, with kf := kf + m, the approximant has at least 2' valid bits of fif. 
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This type of analysis has serious shortcomings. In particular, Theorem 4.12 excludes the case 
where Id — f'{fJ.f) is singular. We will include this case in our convergence analysis in § 5 and 
§ 6. Furthermore, and maybe more severely. Theorem 4.12 does not give any bound on kf. We 
solve this problem for strongly connected SPPs in § 5. 



4.4 Reduction to the Quadratic Case 

In this section we reduce SPPs to quadratic SPPs, i.e., to SPPs in which every polynomial fi{X) 
has degree at most 2, and show that the convergence on the quadratic SPP is no faster than 
on the original SPP. In the following sections we will obtain convergence speed guarantees of 
Newton's method on quadratic SPPs. Hence, one can perform Newton's method on the original 
SPP and, using the results of this section, convergence is at least as fast as on the corresponding 
quadratic SPP. 

The idea to reduce the degree of our SPP / is to introduce auxiliary variables that express 
quadratic subterms. This can be done repeatedly until all polynomials in the system have reached 
degree at most 2. The construction is very similar to the one that transforms a context-free 
grammar into another grammar in Chomsky normal form. The following theorem shows that 
the transformation docs not accelerate the convergence of Newton's method. 

Theorem 4.13. Let f{X) be a clean and feasible SPP such that fs{X) = g{X) + h{X)X,Xj 
for some 1 < i,j,s < n, where g{X) and h{X) are polynomials with nonnegative coefficients. 
Let J{X,Y) be the SPP given by 

UX, Y) = ft{X) for every £ g {1, . . . , s - 1} 

f^{X,Y)=g{X) + h{X)Y 
^ fiiX, Y) = fe{X) for every i e {s + 1, . . . ,n} 

/„+i(x,r) = M,. 

Then the function b : K." — )• M"+-'^ given by b{X) = (^i, . . . , Xn, XiXj)'^ is a bijection between 
the set of fixed points of f{X) and f{X,Y). Moreover, u^^"^ < . . . vl%\ i/''^^'^ v^j'^ )^ for all 

fc e N, where v'^^^ and v^^^ are the Newton approximants of f and f , respectively. 

Proof. We first show the claim regarding b: if a; is a fixed point of /, then b{x) ~ {x, XiXj) is a 
fixed point of /. Conversely, if (x, y) is a fixed point of /, then we have y = XiXj implying that 
X is a fixed point of /. Therefore, the least fixed point fxf of / determines /i/, and vice versa. 

JNow we show that the Newton sequence of / converges at least as fast as the Newton sequence 
of /. In the following wo write Y for the (n-l-l)-dimensional vector of variables (Xi, . . . , X„, Y)^ 
and, as usual, X for (Xi, . . . ,X„)^. For an (?? + l)-dimensional vector x, we let xjj^ „] denote 
its restriction to the n first components, i.e., a;[i „] (xi, . . . ,a;„)^. Note that ^[i,,!] = X. Let 
es denote the unit vector (0, . . . , 0, 1, . . . 0)^, where the "1" is on the s-th place. We have: 



f{Y) 



f{X) + eMX){Y -X,Xj) 
XjXi 



and 

1\y) = 

We need the following lemma. 



f{X) + e,dxh{X)(Y-X,Xj) eMX] 

dxx.x, 
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Lemma 4.14. Let z e S = (id - f'{z)) \f{z) - z) and 5 

Id - / {z,ZiZj)j {f{z,ZiZj) - {z,z,Zj) ). Then 6 = S[i^„]- 
Proof of the lemma. 

f'{z) - eshiz)dx{X,Xj)\x=. esh{z) 
dxX.Xjlx^z 

Wc have (Id — / {z, ZiZj))S — {f{z, ZiZj) — {z, ZiZj)^), or cquivalcntly: 

Id - f{z) + eMz)^x{X^X,)\x=. -eMz)\ (\i,n\ ^ f f{z) - z 
-dxX,X,\x=. 1 J\S^+,) { 

Multiplying the last row by esh{z) and adding to the first n rows yields: 

(Id - f{z)) = f{z) - z 

So we have ^[i,n] = (id — f'{z)) ^ {f{z) — z) ~ 5, which proves the lemma. □ 
Now we proceed by induction on k to show i^jf ■*„] < i'^*'' , where u^'^^ is the Newton sequence 

for /. By definition of the Newton sequence this is true for fc = 0. For the step, let fc > and 
define u := ■ v^^'^)^ . Then wc have: 

< A/'^(m)[i^„] (see below) 
= u^^ +({lA-j\u))-\J{u)-u)) 

= + (Id - f'i^Uyr'ifi^U) - (Lemma 4.14) 

<Aff{v'-''^) (induction) 

At the inequality marked with (*) we used the monotonicity oi J\fj (Lemma 4.9) combined 

with Theorem 4.1, which states u^'''' < f{u^^^), hence in particular v^^^^i < vf'''v^^^. This 
concludes the proof of Theorem 4.13. □ 

5 Strongly Connected SPPs 

In this section we study the convergence speed of Newton's method on strongly connected SPPs, 
short scSPPs, see Definition 2.6. 

5.1 Cone Vectors 

Our convergence speed analysis makes crucial use of the existence of cone vectors. 
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Definition 5.1. Let f be an SPP. A vector d G ]R"q is a cone vector if d >~ and < d. 

We will show that any scSPP has a cone vector, see Proposition 5.4 below. As a first step, 
we show the following lemma. 

Lemma 5.2. Any clean and feasible scSPP f has a vector d > with f' {fjLf)d < d. 

Proof. Consider the Kleene sequence (k^*''-')^^^- Since / is strongly connected, we have < 
K^''^ ^ Hf for all fc e N. By Theorem 4.1.2., the matrices (Id - f {k^^'^))-^ = /'(/^C^))* exist for 
all k. Let ||-|| be any norm. Define the vectors 

Notice that for all fc £ N we have (Id - = \\f'(^J''))'i\\ ' 1 > 0- Furthermore wc have 

d^'"'^ £ C, where C {x>0\ \\x\\ = 1} is compact. So the sequence (d''^')^^^ has a convergent 
subsequence, whose limit, say d, is also in C. In particular d > 0. As (k:''^')^^^ converges to /i/ 
and (Id - f {K^''"i))d^''^ > 0, it follows by continuity (Id - f{^.f))d > 0. □ 

Lemma 5.3. Let f be a clean and feasible scSPP and let d > with f'{^f)d < d. Then d is 
a cone vector, i.e., d >- 0. 

Proof. Since / is an SPP, every component of f'{iJ.f) is nonnegative. So, 

< f{f^f)"d < f{fifr-^d <...< f'{fif)d < d. 

Let w.l.o.g. di > 0. As / is strongly connected, there is for all j with l<j<na.nrj<n such 
that {f {nfY^)ji > 0. Hence, {f' {fifY^ d)j > for all j. With above inequality chain, it follows 
that dj > {f'{nfy^d)j > 0. So, dyO. □ 

The following proposition follows immediately by combining Lemmata 5.2 and 5.3. 

Proposition 5.4. Any clean and feasible scSPP has a cone vector. 

We remark that using Perron-Frobcnius theory [BP79] there is a simpler proof for Proposi- 
tion 5.4: By Theorem 4.1 f'{x)* exists for all x ~< jif. So, by fundamental matrix facts [BP79], 
the spectral radius of f'{x) is less than 1 for all x -< jif . As the eigenvalues of a matrix de- 
pend continuously on the matrix, the spectral radius of f' {nf), say p, is at most 1. Since / is 
strongly connected, f'{pf) is irreducible, and so Perron- Frobenius theory guarantees the exis- 
tence of an eigenvector d >- of /'(/i/) with eigenvalue p. So we have f'{p,f)d = pd < d, i.e., 
the eigenvector d is a cone vector. 

5.2 Convergence Speed in Terms of Cone Vectors 

Now we show that cone vectors play a fundamental role for the convergence speed of Newton's 
method. The following lemma gives a lower bound of the Newton approximant in terms of 
a cone vector. 

Lemma 5.5. Let f be a feasible (not necessarily clean) SPP such that /'(O)* exists. Let d be 
a cone vector of f. Let > pf — Ad for some A > 0. Then 

A/'(0) >pf- ^Ad . 
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Proof. We write f{X) as a sum f{X) =c + Y.k=i T^^^ {X,...,X), where D is the degree of / 
and, for aU /c S {1, . . . , D} and ah i e {1, . . . , n}, the component T^'''^ of T*-*^^ is the symmetric 
fc-hnear form associated to the degree- k terms of Let L^^) : ^ M"^" such that 

T^''\X^^\. . . , X^'^)) = . . . • X'-'^l Now we can write 



D 



/(X) = c+^lW(X,...,X)X and /'(X) = ^ fc • ^^(X, . . . , X) . 



fc=i 



k=l 



We write L for L^^), and ft.(X) for /(X) - LX - c. We have: 

(i* = Id + L*L) 



^d=^{L*d-L*Ld) 



>^iL*f'if,f)d-L*Ld) 
= ^L*h'i^if)d 
= L*]^h'{^if)\d 
>L*^h'if,f)^if 
1 ^ 

= L*-J2k-L^''\fif,....tif)fif 

k=2 
D 

>L*Y.L^''\fif,...,fif)fif 

k=2 

= L*h{^lf) 

= L*{fi^If)-L^lf-c) 

= L*fif - L*Lfif - L*c 
= nf-L*c 

= ^If-^^io) 



{f{^if)d < d) 

{f'{x) = h'{x)+L) 
(Ad > ^if) 



{f{x) = h{x) + Lx + c) 

(L* = Id + L*L) 

(AA(0) = /'(0)*/(0) = L*c) 



□ 



We extend Lemma 5.5 to arbitrary vectors x as foUows. 



Lemma 5.6. Let f be a feasible (not necessarily clean) SPP. Let < x < fif and x < f{x) 
such that f (x)* exists. Let d he a cone vector of f. Let x > j.if — Xd for some A > 0. Then 



>fif- -Xd . 

Proof. Define g{X) := /(X + x) — x. We first show that g is an SPP (not necessarily clean). 
The only coefficients of g that could be negative are those of degree 0. But we have ^(0) = 
f{x) ~ X >0, and so these coefficients are also nonnegative. 

It follows immediately from the definition that /i/ — a; > is the least fixed point of g. 
Moreover, g satisfies g'{fif — x)d < d, and so d is also a cone vector of g. Finally, we have 
> fj,f — X — Xd = fig — Xd. So, Lemma 5.5 can be applied as follows. 

Mf(x)^x + f{xr{f{x)-x) 
^x + g'{0)*{giO)-0) 
= x+Mg{0) 
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> X + i^ig — -Ad (Lemma 5.5) 

□ 

By induction we can extend this lemma to the whole Newton sequence: 

Lemma 5.7. Let d be a cone vector of a clean and feasible SPP f and let X,nax = m^iXjl^^}. 
Then 

lyC^') >^lf- 2-^\maxd . 




Fig. 2. Illustration of Lemma 5.7: The points (shape: +) on the ray r along a cone vector are lower 
bounds on the Newton approximants (shape: x). 



Before proving the lemma we illustrate it by a picture. The dashed line in Figure 2 is the ray 
r{t) ^ — td along a cone vector d. Notice that r(0) equals /if and r{X„iax) is the greatest 
point on the ray that is below 0. The figure also shows the Newton iterates v^'^^ for < fc < 2 
(shape: x) and the corresponding points r{2~''Xmax) (shape: +) located on the ray r. Observe 
that i'^'^' > r{2~'' Xmax), as claimed by Lemma 5.7. 

Proof (of Lemma 5.7). By induction on k. For the induction base (fc = 0) we have for all 
components i: 

{^if - Xraaxd)^ = (^^f - max { ^ } = , 

so = > ^/ - Xrnaxd. 

For the induction step, let fc > 0. By induction hypothesis we have u^'^^ > fif — 2^^Xmaxd- 
So we can apply Lemma 5.6 to get 

zyC-^+i) = AA(iy('^)) > m/ - ^2-^-A,„,,d = /i/ - 2-(^+iU™,,d . 
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□ 



The following proposition guarantees a convergence order of the Newton sequence in terms 
of a cone vector. 

Proposition 5.8. Let d be a cone vector of a clean and feasible SPP f and let Xmax = 



max 

i e 



Proof. For all 1 < j < n the following holds. 



< 



A, 



Then I3{kf^d + *) > i for all 

(Lemma 5.7) 
(def. of kf^d) 



< m/j ■ (def. of A™„) 

Hence, has i valid bits of fif. □ 

5.3 Convergence Speed Independent from Cone Vectors 

The convergence order provided by Proposition 5.8 depends on a cone vector d. While Proposi- 
tion 5.4 guarantees the existence of a cone vector for scSPPs, it does not give any information 
on the magnitude of its components. So we do not have any bound yet on the "threshold" kfd 
from Proposition 5.8. The following theorem solves this problem. 

Theorem 5.9. Let f be a quadratic, clean and feasible scSPP. Let Cmm be the smallest nonzero 
coefficient of f and let ^min and fimax be the minimal and maximal component of jj^f, respec- 
tively. Let 

t^max 



ki 



log- 



(Cr, 



^{f^min: 1}) 



Then 



(3{kf + i)>i for all i G N. 
Before we prove Theorem 5.9 we give an example. 

Example 5.10. As an example of application of Theorem 5.9 consider the scSPP equation of the 
back button process of Example 2.9. 

0.4X2X1 + 0.6 
0.3X1X2 + 0.4X3X2 + 0.3 
0.3X1X3 + 0.7 

We wish to know if there is a component s G {1, 2, 3} with jif ^ = 1. Notice that /(I) = 1, 
so ^Jif < 1. Performing 14 Newton steps (e.g. with Maple) yields an approximation to 
with 

■0.98\ /0.99' 
0.97 < < 0.98 
,0.992/ V0.993; 

We have Cmin = 0.3. In addition, since Newton's method converges to /.i/ from below, we 
know ^min > 0.97. Moreover, ^max < 1, as 1 = /(I) and so /i/ < 1. Hence fc/ < 
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log — 6. Theorem 5.9 then imphes that has 8 vahd bits of /if. As 

^ 0.97 -(0.3 -0.97)3 i 

A*/ ^ Ij the absolute errors are bounded by the relative errors, and since 2^^ < 0.004 we know: 

^2-*^\ /0.994\ 
m/ < i^^"^ + I I < 0.984 < 

\0.997/ 

So Theorem 5.9 yields a proof that /i/^ < 1 for all three components s. 

Notice also that the Newton sequence converges much faster than the Kleene sequence 
(K('=))fcgN- We have < (0.89,0.83,0.96)^, so k'^^' has no more than 4 valid bits in any 

component, whereas f (-'^^^ has, in fact, more than 30 valid bits in each component. □ 

For the proof of Theorem 5.9 we need the following lemma. 

Lemma 5.11. Let d he a cone vector of a quadratic, clean and feasible scSPP f. Let Cmm be 
the smallest nonzero coefficient of f and j-imin the minimal component of fif . Let dmin o,i^d d,nax 
be the smallest and the largest component of d, respectively. Then 

dmin , . r ^^\n 

-j > (Cmm • mm{/i™„, 1)) . 

Proof. In what follows we shorten fif to fi. Let w.l.o.g. di — dmax and dn = dmin- We claim 
the existence of indices s, t with 1 < s, i < n such that fstil^) and 

dmax \dt / 

To prove that such s,t exist, we use the fact that / is strongly connected, i.e., that there is a 
sequence 1 = ri, r2, . . . , = rt with q < n such that /'.^^j,,^. (X) is not constant zero. As /j, y 0, 
we have f' ^ (u) ^ 0. Furthermore 



and so 



di _ dr^ ^rg-i 
dfi dr2 dr^ 
di ^ dri , , 1 '^^ 



log — = log + • • • + log 

dn d. 



So there must exist a j such that 



log ^ < (<7 — 1) log -7—^ < n log -7—^ , and so 

On ^j + l ^Tj + l 



'j + i 



dm ^ f d. 

Hence one can choose s = rj^i and t = rj. 

As d is a cone vector we have f'{fi)d < d and thus f'^^{fi)dt < dg. Hence 

fM <Y- (8) 

On the other hand, since / is quadratic, /' is a linear mapping such that 
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where &i,...,6„ and ^ are coefBcients of quadratic, respectively linear, monomials of /. As 
f'gtilA 7^ Oi at least one of these coefficients must be nonzero and so greater than or equal to 
Cmin- It follows f'st{^J') > Cmin ' mm{fjLmtn, !}• So we have 

(c™„ • min{/^„„, 1})" < 

<g)" (by (8)) 

< ^ (by (7)) . 



□ 

Now we can prove Theorem 5.9. 
Proof (of Theorem 5.9). By Proposition 5.4, / has a cone vector d. Let d^^ax = maxj{dj} and 
dmin = v[i\nj{dj} and X^ax = maxj |^| and A„„„ = min_, |^|- We have: 

^max ^ l^max ' d^ax / » ^ d^nax J \ \ ^min \ 
T S (as Ajnax S anu Arnin ^ j 

< ^'""'^ --^ (Lemma 5.11) . 

So the statement follows with Proposition 5.8. □ 

The following consequence of Theorem 5.9 removes some of the parameters on which the kf 
from Theorem 5.9 depends. 

Theorem 5.12. Let f be a quadratic, clean and feasible scSPP, let ^„iin cind ^max be the 
minimal and maximal component of fif , respectively, and let the coefficients of f be given as 
ratios of m-bit integers. Then 

/3{kf + i)>i for allieN 
holds for any of the following choices of kf. 

1. [4mn + 37imax{0, - log/i„„„}]; 

2. 4mn2''; 

3. 7mn whenever /(O) >~ 0; 

4-. 2mn + m whenever both /(O) >- and fimax < 1- 

Items 3. and 4. of Theorem 5.12 apply in particular to termination SPPs of strict pPDAs (§ 2.4), 
i.e., they satisfy /(O) >- and fi^ax < 1- 

To prove Theorem 5.12 we need some relations between the parameters of /. We collect 
them in the following lemma. 

Lemma 5.13. Let f be a quadratic, clean and feasible scSPP. With the terminology of Theo- 
rem 5.9 and Theorem 5.12 the following relations hold. 

2- If /(O) y then ^Irnm > Crrnn- 

3. If Cyyi^iji > 1 then fimin 1 • 

4- If Cmin ^ 1 then ^min ^ (^min ' 

5. If f is strictly quadratic, i. e. nonlinear, then the following inequalities hold: Cmin < 1 md 



Proof. We show the relations in turn. 
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1. The smallest nonzero coefficient representable as a ratio of m-bit numbers is 

2. As /(O) >- 0, in all components i there is a nonzero coefficient such that /i(0) = We 
have ^/ > /(O), so > /i(0) = q > c,„„ > holds for all i. Hence /i,„„ > 0. 

3. Let Cmin > 1- Recall the Kleene sequence {K^'^^)k£n with k''"'' = /'^(O). We first show by 
induction on k that for all fc G N and all components i either Kp-* = holds or k\''^ > 1. For 

the induction base we have /t'-^-' = 0. Let k > 0. Then k['^^^^ = is a sum of products 

of numbers which are either coefficients of / (and hence by assumption greater than 1) or 
which arc equal to k^'^' for some j. By induction, k^'^^ is either or greater than 1. So, ^1'^''^^^ 
must be or greater than 1. 

By Theorem 2.4, the Kleene sequence converges to ^f. As / is clean, we have /i/ >- 0, and 
so there is a fc G N such that k''^' >- 1. The statement follows with > k^'^\ 

4. Let Cmin < 1- We prove the following stronger statement by induction on k: For every k with 
< < n there is a set Sk C {1, . . . , n}, 15*^1 = k, such that nf^ > c^~^ holds for all s G Sk- 
The induction base (fc = 0) is trivial. Let fc > 0. Consider the SPP /(-X'{i,....n}\Sfc ) that is 
obtained from f{X) by removing the S'fc-components from / and replacing every 5fc-variable 
in the polynomials by the corresponding component of /i/. Clearly, /i/ = {^J■f){l,....n}\Sk■ 
By induction, the smallest nonzero coefficient Cmin of / satisfies Cmin > Cmi„(c^j~^)^ = 
cLC"^- Pick a component i with ./^(O) > 0. Then fif^ > fi{0) > c,mn > cf^m~^. So set 
Sk+i ■■= Sk U {i}. 

5. Let w.l.o.g. Umax = fJ-fi - The proof is based on the idea that Xi indirectly depends quadrati- 
cally on itself. More precisely, as / is strongly connected and strictly quadratic, component 1 
depends (indirectly) on some component, say v, such that fi^ contains a degree-2-monomial. 
The variables in that monomial, in turn, depend on Xi. This gives an inequality of the form 
At/i > C ■ ^ifi^, implying nfi-C< 1. 

We give the details in the following. As / is strongly connected and strictly quadratic 
there exists a sequence of variables Xi-^, . . . , Xi^ and a sequence of monomials nii-^, . . . , nii^ 
(1 < r < n) with the following properties: 

~ A^ii ~ Xi, 

- nii^ is a monomial appearing in fi^ ^ u < r), 

- rrii^ = • {I <u <r), 

- nii^ = Ci^ ■ Xj-^ ■ Xk-^ for some variables Xj-^ , Xk-^ ■ 



Notice that 



> min(c;^„,l) ■ fifj^ ■ fifk^ 



(9) 



Again using that / is strongly connected, there exists a sequence of variables Xj^ , . . . , Xj^ 
and a sequence of monomials rrij-^ , • • ■ , "Zjs-i ^ s < n) with the following properties: 

- nij^ is a monomial appearing in fj^ (l<w<s — 1), 

- rrij^ = • Xj^^, or m^^ = Cj^ ■ Xj^^, ■ AT^-;^^ 

for some variable Xj' (1 < u < s — 1). 

f^fj, > ■ ■ • • • Cj^_, ■ mm{^i■l-^^, 1) • /i/i ^^^^ 



Notice that 
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Similarly, there exists a sequence of variables Xk^, ■ ■ ■ , {I < t < n) with Xk^ = Xi 
showing 

M/fc, > min(C7„\ 1) ■ min(A.'47„i, 1) • ^,f, . (11) 
Combining (9) with (10) and (11) yields 

or 

fi^ax ■ ^lin(c3^-^ 1) • min(/i2„"-2, 1) < 1 . (12) 

Now it suffices to show Cmm < 1- Assume for a contradiction Cmi„ > 1. Then, by statement 3., 
Aimm > 1- Plugging this into (12) yields fimax < 1- This implies firnax < fJ-mm, contradicting 
the definition of Umax and jirnm- 

□ 



Now we are ready to prove Theorem 5.12. 
Proof (of Theorem 5.12). 

1. First we check the case where / is linear, i.e., all polynomials fi have degree at most 1. In this 
case, Newton's method reaches /i/ after one iteration, so the statement holds. Consequently, 
we can assume in the following that / is strictly quadratic, meaning that / is quadratic and 
there is a polynomial in / of degree 2. 
By Theorem 5.9 it suffices to show 



log- 



Mr) 



Mr, 



< Amn + in max{0, — \og ^min\ 



We have 



log- 



< loe 



1 



4n-2 



mm 



< An ■ log - 



log(min{/^^',;\l}) 



< 4mn^log(min{//^"„i,l}) 



(Lemma 5.13.5.) 

(Lemma 5.13.5.: 
(Lemma 5.13.1.) . 



< 1) 



If Mmirt > 1 we have — log(min{^^j„^, 1}) < 0, so we are done in this case. If ^min < 1 we 
have - log(min{/^^'~\ 1}) = -(3rt - 1) log//mm < 3n • (- log //„„). 

By statement 1. of this theorem, it suffices to show that Amn + 3n max{0, — log/i,„i„} < 
Amn^^. This inequality obviously holds if Umin > 1. So let /imm < 1- Then, by 
Lemma 5.13.3., Cmin < 1- Hence, by Lemma 5.13 parts 4. and 1., /imm > c^in^ — 2~™(^"~^). 
So we have an upper bound on — log^j„i„ with — log ^min < m(2" — 1) and get: 



Amn + 3n max{0, — log ^r. 



,} < 4mn + 3nm(2" - 1) 

< Amn + 4nm(2" - 1) = 4to7i2" 



3. Let /(O) >- 0. By statement 1. of this theorem it suffices to show that Amn + 
3nmax{0, — log/imj„} < 7mn holds. By Lemma 5.13 parts 2. and 1., we have Mmm > 
Cmin ^ 2~™, so — log ^mi„ < m. Hcucc, 4mn + 3n max{0, — log /^r„i„ } < Amn + 3nm— 7mn. 



26 



4. Let /(O) >- and 

f-^max 



log- 



fJ-r, 



log 



min{Ai;;„, 1} 



'.X < 1- By Theorem 5.9 it sufBces to show that 
< 2mn + m. We have: 



n log c„,,„ - 



min{Ai;5,,„, 1} 

< -nlogc„i.,„ - (n + 1) log^„ 

< -{2n+ l)logc™„ 

< 2mn + m 



(as firnin — l^max ^ 1) 

(Lemma 5.13.2.) 
(Lemma 5.13.1.) 



□ 



5.4 Upper Bounds on the Least Fixed Point Via Newton Approximants 

By Theorem 4.1 each Newton approximant u'^'^^ is a lower bound on /i/. Theorem 5.9 and 
Theorem 5.12 give us upper bounds on the error zX^*"'' := fsf — v^^\ Those bounds can directly 
transformed into upper bounds on as = u^^'^ + cf. Example 5.10. 

Theorem 5.9 and Theorem 5.12 allow to compute bounds on A'^'^^ even before the Newton 
iteration has been started. However, this may be more than we actually need. In practice, we 
may wish to use an iterative method that yields guaranteed lower and upper bounds on /i/ that 
improve during the iteration. The following theorem and its corollary can be used to this end. 

Theorem 5.14. Let f be a quadratic, clean and feasible scSPP. Let <x < fif and x < f{x) 
such that f'{x)* exists. Let c„iin be the smallest nonzero coefficient of f and ^min the minimal 
component of pif . Then 



\^lf-^^{x)\\ 



> (Cn 



l{A^min 7 1}) 



We prove Theorem 5.14 at the end of the section. The theorem can be applied to the Newton 
approximants: 

Theorem 5.15. Let f be a quadratic, clean and feasible scSPP. LetCmin be the smallest nonzero 
coefficient of f and fimin the minimal component of fif. For all Newton approximants i/'^*^' with 
1/'^'°^ >~ 0, let ^Inin be the smallest coefficient of i/'*^' . Then 



(fe) 



1/ 



(fe-i)| 



Crmn • min{:/^]„ , 1 } 



where [s] denotes the vector x with xj = s for all 1 < j < n. 

Proof (of Theorem 5.15). Theorem 5.14 applies, due to Theorem 4.1, to the Newton approxi- 
mants with X = So we get 



IJ-f 



(k) 



< 



< 



(cmm • min 



Hence the statement follows from i/^'"'^ < /.*/. 



(as z/^'^) < ^lf) . 
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Example 5.16. Consider again the equation X = f{X) from Examples 2.9 and 5.10: 

0.4X2X1 + 0.6 
0.3X1X2 + 0.4X3X2 + 0.3 
0.3X1X3 + 0.7 

Again we wish to verify that there is no component s G {1,2,3} with fif ^ — 1. Performing 10 
Newton steps yields an approximation v'-^^^ to fif with 

^0.9828\ /0.9829\ 
0.9738 ^ 1/(1°) ^ 0.9739 . 
^0.9926/ \0.9927/ 

Further, it holds - z/(9)|| < 2 ■ 10~^. So we have 

||iy(10) „i^(9)|| 2-10-*^ 



, < T < 0.00009 

and hence by Theorem 5.15 

/0.983^ 

<f^f< i^'^"^ + [0.00009] < 0.974 

\0.993; 

In particular we know that fif ^ < 1 for all three components s. 
Example 5.17. Consider again the SPP / from Example 5.16. Setting 



fo.3 • i^^l' ' 



Theorem 5.15 guarantees 

Let us measure the tightness of the bounds i/'^'"-' and it^'^^ on /.t/ in the first component. Let 

Piowerik) - log2(At/i ^ ) and 

Pupperik) -log2(l4''^ - A'/l) • 

Roughly speaking, and u[''^ have piower{k) and Pupper{k) valid bits of /i/i, respectively. 
Figure 3 shows piower{k) and Pupper{k) for fc € {1, . . . , 11}. 

It can be seen that the slope oi piower{k) is approximately 1 for fc = 2, . . . , 6. This corresponds 
to the linear convergence of Newton's method according to Theorem 5.9. Since Id — /'(m/) is 
non-singular"^, Newton's method actually has, asymptotically, an exponential convergence order, 
cf. Theorem 4.12. This behavior can be observed in Figure 3 for k > 7. For Pupper, we roughly 
have (using v^''^ « ^/): 



Pupper{k) « plower{k - 1) + log ^0.3 • J^,^,t-„) ~ Plower{k - 1) 



5 . 

□ 



In fact, the matrix is "almost" singular, with det(Id — f'{iJ.f)) ~ 0.006. 
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Fig. 3. Number of valid bits of the lower (shape: x) and upper (shape: +) bounds on /i/^, see Exam- 
ple 5.17. 



The proof of Theorem 5.14 uses techniques similar to those of the proof of Theorem 5.9, in 
particular Lemma 5.11. 



the smallest and the largest component of d, respectively. Let Xmax '■= maxj{ ^'^^ }, and let 
^''^d7^^ • have X > /i/ — X^axd, so we can apply Lemma 5.6 to obtain 



Proof (of Theorem 5.14). By Proposition 5.4, / has a cone vector d. Let dmin and dmax be 
the smalh 
w.l.o.g. A 

J\f{x) > flf - ^X,naxd. Thus 

\\J^{x) - X\\^ > {JV{X) - X)j^ > /i/i - ^Xmaxdl - Xi = ^Xmaxdl > ^Xmaxdmm ■ 

On the other hand, with Lemma 4.5 we have < jif —M{x) < ^Xmaxd and so — J\f{x)\\^ < 
h^maxdmax- Combining those inequalities we obtain 



\mx) 



\\^^f-^^ix)l 

Now the statement follows from Lemma 5.11. 



> 



6 General SPPs 

In § 5 we considered strongly connected SPPs, see Definition 2.6. However, it is not always 
guaranteed that the SPP / is strongly connected. In this section we analyze the convergence 
speed of two variants of Newton's method that both compute approximations of /x/, where / 
is a clean and feasible SPP that is not necessarily strongly connected ("general SPPs"). 
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The first one was suggested by Etessami and Yannakakis [EY09] and is called Decomposed 
Newton's Method (DNM). It works by running Newton's method separately on each SCC, see 
§ 6.1. The second one is the regular Newton's method from § 4. We will analyze its convergence 
speed in § 6.2. 

The reason why we first analyze DNM is that our convergence speed results about Newton's 
method for general SPPs (Theorem 6.5) build on our results about DNM (Theorem 6.2). From 
an efficiency point of view it actually may be advantageous to run Newton's method separately 
on each SCC. For those reasons DNM deserves a separate treatment. 

6.1 Convergence Speed of the Decomposed Newton's Method (DNM) 

DNM, originally suggested in [EY09], works as follows. It starts by using Newton's method for 
each bottom SCC, say S, of the SPP /. Then the corresponding variables Xs are substituted 
for the obtained approximation for and the corresponding equations X5 = fsi^) are 

removed. The same procedure is then applied to the new bottom SCCs, until all SCCs have 
been processed. 

Etessami and Yannakakis did not provide a particular criterion for the number of Newton 
iterations to be applied in each SCC. Consequently, they did not analyze the convergence speed 
of DNM. We will treat those issues in this section, thereby taking advantage of our previous 
analysis of scSPPs. 

We fix a quadratic, clean and feasible SPP / for this section. We assume that we have already 
computed the DAG (directed acyclic graph) of SCCs. This can be done in linear time in the size 
of /. To each SCC S we can associate its depth t: it is the longest path in the DAG of SCCs from 
5 to a top SCC. Notice that < t < n - 1. We write SCC(t) for the set of SCCs of depth t. We 
define the height h{f) as the largest depth of an SCC and the width w{f) := maxt \SCC{t)\ as 
the largest number of SCCs of the same depth. Notice that / has at most {h{f) + 1) ■w{f) SCCs. 
Further we define the component sets [t] := Usescc(t) ^ ^^'^ t^^l Ut'>J^'] ^^'l similarly [< t]. 



Fig. 4. Decomposed Newton's Method (DNM) for computing an approximation p^'' of /if. 

Figure 4 shows our version of DNM. We suggest to run Newton's method in each SCC S for a 
number of steps that depends (exponentially) on the depth of S and (linearly) on a parameter i 
that controls the precision. 

Proposition 6.1. The function DNM(/, i) of Figure 4 runs at most 
i ■ w{f) ■ 2'^^^^^-^ < i ■ n ■ 2" iterations of Newton's method. 

Proof The number of iterations is Y^^i^^^ \<SCC{t) \ • i ■ 2*. This can be estimated as follows. 



function DNM {f,i) 

for t from h{f) downto 
forall S € SCC{t) 



/*' The parameter i controls the precision. */ 



/* for all SCCs S of depth t */ 
/* perform i ■ 2' Newton iterations */ 
/* apply in the upper SCCs */ 



return p 



:=AA)f (0) 
/[<*] fl<t][S/p^s^] 



Hf) Hf) 
\SCC{t)\ • i • 2* < w{f) • i • ^ 2* 
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< w{f) ■ f 2'''^^^+^ 

< i • n • 2" 



(as w{f) < n and h{f) < n) 



□ 

The following theorem states that DNM has linear convergence order. 

Theorem 6.2. Let f be a quadratic, clean and feasible SPP. Let p^^^ denote the result of calling 
DNM(/,i) (see Figure 4)- Let /3p denote the convergence order of (/9^*'')igN. Then there is a 
kf eN such that (3p{kf + i) > i for all i e N. 

Theorem 6.2 can be interpreted as follows: Increasing i by one yields asymptotically at least 
one additional bit in each component and, by Proposition 6.1, costs at most n ■ 2" additional 
Newton iterations. Notice that for simplicity we do not take into account here that the cost of 
performing a Newton step on a single SCC is not uniform, but rather depends on the size of the 
sec (e.g. cubically if Gaussian elimination is used for solving the linear systems). 

For the proof of Theorem 6.2, let /l*-'-* denote the error when running DNM with parameter i, 
i.e., /l*-'-* := /i/ — p'-*'. Observe that the error /X*-'-* can be understood as the sum of two errors: 



/i(^) := M/ - p(^' = (/X - /!(■'') + (/!(■'' - p(^)) , 



where /i(/[(j[[>i]/pj!^jj]); i.e., is the least fixed point of /[^j after the approximations 

from the lower SCCs have been applied. So, consists of the propagation error {^if^t] ^ /^[t/) 



(i) 



(resulting from the error at lower SCCs) and the approximation error (/Xj^j 
from the newly added error of Newton's method on level t). 

The following lemma gives a bound on the propagation error. 

Lemma 6.3 (Propagation error). There is a constant Cf > such that 



P[i\) (resulting 



l^f[>t] - P\ 



>t] 



holds for all p[>t] with < < Ai/[>t]; where P[t] = ^(/[t] [[>i]/P[>i]]) ■ 

Roughly speaking. Lemma 6.3 states that if has k valid bits of fif^^^^, then Jl^^j has at 
least about k/2 valid bits of fif^^y In other words, (at most) one half of the valid bits are lost 
on each level of the DAG due to the propagation error. The proof of Lemma 6.3 is technically 
involved and, unfortunately, not constructive in that we know nothing about Cf except for its 
existence. Therefore, the statements in this section are independent of a particular norm. The 
proof of Lemma 6.3 can be found in Appendix A. 



The following lemma gives a bound on the error 
tion error and the approximation error into account 



on level t, taking both the propaga- 



Lemma 6.4. There is a Cf > such that 



< for all i e N. 



Proof. Let /[j' := /[t] [[> Observe that the coefficients of /[^j^ and thus its least fixed 



point /Xjj'j'' are monotonically increasing with i, because is monotonically increasing as well. 
Consider an arbitrary depth t and choose real numbers Cmin > and ^min > and an integer 
such that, for all i > io, Cmin and /imi„ are lower bounds on the smallest nonzero coefficient of /j^j 
and the smallest coefficient of j respectively. Let y^max be the largest component of ^J^f^t]■ 
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Let k 



n ■ log 



(^m?/l'M77li71 '121111 



Then it follows from Theorem 5.9 that performing k + j 



Newton iterations (j > 0) on depth t yields j valid bits of fjbij^ for any i > Iq. In particular, fc+«-2* 

Newton iterations give i ■ 2* valid bits of Jl^^^^ for any i > io. So there exists a constant ci > 
such that, for all i > io, 



< 2 



ci-i-2' 



(13) 



because DNM (see Figure 4) performs i ■ 2* iterations to compute /Jg"* where S is an SCC of 
depth t. Choose ci large enough such that Equation (13) holds for alH > and all depths t. 

Now we can prove the theorem by induction on t. In the base case (t = h{f)) there is no 
propagation error, so the claim of the lemma follows from (13). Let t < h{f). Then 



iii) 



^[t] ^[t] P[t] 



< 
< 



M/[t] - M[t/ 



<C2 



^l>t] 



>ci-i-2' 



(i) 



)Ci-i-2* 



< C2 • V2=3-''-2*+i 2C1-Z-2' 



< 2 



C4-i-2' 



(by (13)) 

(Lemma 6.3) 
(induction hypothesis) 



for some constants C2,C3,C4 > 0. □ 
Now Theorem 6.2 follows easily. 

Proof (of Theorem 6.2). From Lemma 6.4 we deduce that for each component j G [t] there is a 
Cj such that 

Let kf > Cj for all 1 < j < n. Then 

(a^/, - ^r'' Va^/, < 2=^-(^+^^) < 2-^ . 



Notice that, unfortunately, we cannot give a bound on kf, mainly because Lemma 6.3 does 
not provide a bound on C/ . 



6.2 Convergence Speed of Newton's Method 

We use Theorem 6.2 to prove the following theorem for the regular (i.e. not decomposed) Newton 
sequence (i/('))igN- 

Theorem 6.5. Let f he a quadratic, clean and feasible SPP. There is a threshold kf G N such 
that P{kf + i • n • 2") > P{kf + i ■ {h{f) + 1) • 2''^f'>) > i for all i G N. 

In the rest of the section we prove this theorem by a sequence of lemmata. The following 
lemma states that a Newton step is not faster on an SCC, if the values of the lower SCCs are 
fixed. 
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Lemma 6.6. Let f be a clean and feasible SPP. Let < x < f(x) < nf such that f'{x)* 
exists. Let S be an SCC of f and let L denote the set of components that are not in S , but on 
which a variable in S depends. Then {Aff{x))s > Nf ^[l/x^]^(xs). 



Proof. 



{Mf{x))s = {r{xr{f{x)-x))^ 

= fixYssifix) - x)s + f'ix):,^{f{x) - x)l 

>f'ixrssifix)-x)s 

= {{fs[L/xL]nxs)yifs[L/xL]{xs)^xs) 



Recall Lemma 4.9 which states that the Newton operator TV is monotone. This fact and 
Lemma 6.6 can be combined to the following lemma stating that i ■ {h{f) + 1) iterations of the 
regular Newton's method "dominate" a decomposed Newton's method that performs i Newton 
steps in each SCC. 

Lemma 6.7. Let denote the result of a decomposed Newton's method which performs i 
iterations of Newton's method in each SCC. Let i/^*^ denote the result of i iterations of the 
regular Newton's method. Then i>'(*'(''(^)+i)) > 

Proof. Let h = h{f). Let [t] and [>t] again denote the set of components of depth t and > t, 
respectively. We show by induction on the depth t: 

H-{h+l-t)) 

The induction base {t — h) is clear, because for bottom SCCs the two methods arc identical. 
Let now t < h. Then 



[*] 

> A/"' , (induction hypothesis) 

^■^;.[[>*l/^[>'./°[*l) (Lemma 4.9) 

= I'jj' (definition of i/^'^) 

Now, the lemma itself follows by using Lemma 4.9 once more. □ 

As a side note, observe that above proof of Lemma 6.7 implicitly benefits from the fact 
that SCCs of the same depth are independent. So, SCCs with the same depth arc handled in 
parallel by the regular Newton's method. Therefore, w{f), the width of /, is irrelevant here (cf. 
Proposition 6.1). 

Now we can prove Theorem 6.5. 

Proof (of Theorem 6.5). Let be the kf of Theorem 6.2, and let fci = ^2 • iKf) + 1) ■ 2''(-^). 
Then we have 
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((fe2+i)-2''<^') 



(Lemma 6.7) 



(fc2+i) 



where the last step follows from the fact that DNM(/,fc2 + i) runs at most (fc2 + i) ■ 2''''^^^ 
iterations in every SCC. By Theorem 6.2, pC^^+i) g^j^^^ hence have i valid 

bits of . Therefore, Theorem 6.5 holds with kf ^ ki. □ 



7 Upper Bounds on the Convergence 

In this section we show that the lower bounds on the convergence order of Newton's method 
that we obtained in the previous section are essentially tight, meaning that an exponential (in n) 
number of iterations may be needed per bit. 

More precisely, we expose a family ( /^"^ ) of SPPs with n variables, such that more than 

k ■ 2"~^ iterations are needed for k valid bits. Consider the following system. 

1 I 1 

2 + 2^1 

4^1 + 2^^-^'^ ^ 



1 x^2 
1^2 



(14) 



(1, 



, 1)^. Notice that each component of is an SCC. 



The only solution of (14) is ^/^"^ 
Wc prove the following theorem. 

Theorem 7.1. The convergence order of Newton's method applied to the SPP /'"^ from (14) 
( with n>2) satisfies 

/3(fc • 2"-i) < k for all fc G {1, 2, . . .}. 

In particular, /3(2"^^) = 0. 

Proof. We write / := for simplicity. Let 



zl(') ~ /// - ly(') = (1, . . . , 1)^ - jy(') 



Notice that (i^^'^) 



igN = (0, ^, |, |, . • •) which is the same sequence as obtained by applying 



Newton's method to the 1-dimensional system Xi = i + \Xl. So we have A{ 
i iterations we have exactly i valid bits in the first component. 



We know from Theorem 4.1 that for all j with 1 < < n — 1 we have < fj+i{^^^'') = 
+ + ji'^j+i)'^ ^^"^ ^j+i — 1- It follows that t^j^^ is at most the least solution 

ofX,+i = i(.f)2 



i.e., after 



^iyfx,+, + i(Xj+i)2, and so Afl, > 2 J A 



By induction it follows that zij'jj > {A['^)^ \ 



> 



At'"' 



2-<"- 



3 + 

In particular. 



i 



A 



Hence, after fc-2" ^ iterations we have fewer than k valid bits. 



□ 



Notice that the proof exploits that an error in the first component gets "amplified" along 
the DAG of SCCs. One can also show along those lines that computing /i/ is an ill-conditioned 
problem: Consider the SPP gr^"'"^' obtained from /'■"'' by replacing the first component by 1 — e 



where 0<e<l. If£ = then (/xg'^"'^))„ = 1, whereas if e = 



then {}j.g 



< -2- In 



other words, to get 1 bit of precision of ng one needs exponenti^ly in n many bits in g. Note 
that this observation is independent from any particular method to compute or approximate 
the least fixed point. 
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8 Geometrical Aspects of SPPs 

As shown in § 4.4 we can assume that / consists of quadratic polynomials. For quadratic 
polynomials the locus of zeros is also called a quadric surface, or more commonly quadric. 
Quadrics are one of the most fundamental class of hypersurfaccs. It is therefore natural to study 
the quadrics induced by a quadratic SPP /, and how the Newton sequence is connected to these 
surfaces. 

Let us write q for f ~ X. Every component qi of q is also a quadratic polynomial each 
defining a quadric denoted by 

Q,; := {x G M" I q,{x) = Mx) - a-, = 0}. 

Finding fif thus corresponds to finding the least non-negative point of intersection of these n 
quadrics Qi. 

Example 8.1. Consider the SPP / given by 
leading to 

gi(X, Y) = + iy2 ^ i - X and q^iX, Y) ^\x + \xY + iy^ ^ i - K 

Using standard techniques from linear algebra one can show that q\ defines an ellipse while qi 
describes a parabola (see Figure 5). □ 




(a) (b) 

Fig. 5. (a) The quadrics induced by the SPP from Example 8.1 with "gi = 0" an ellipse, and "^2 = 0" a 
parabola, (b) Close-up view of the region important for determining [if . The crosses show the Newton 
approximants of [if . 

Figure 5 shows the two quadrics induced by the SPP / discussed in the example above. In 
Figure 5 (a) one can recognize one of the two quadrics as an ellipse while the other one is a 
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parabola. In this example the Newton approximants (depicted as crosses) stay within the region 
enclosed by the coordinate axes and the two quadrics as shown in Figure 5 (b). 

In this section we want to show that the above picture in principle is the same for all clean 
and feasible scSPPs. That is, we show that the Newton (and Kleene) approximants always stay 
in the region enclosed by the coordinate axes and the quadrics. We characterize this region and 
study some of the properties of the quadrics restricted to this region. This eventually leads to 
a generaliztion of Newton's method (Theorem 8.13). We close the section by showing that this 
new method converges at least as fast as Newton's method. All missing proofs can be found in 
the appendix. 

Let us start with the properties of the quadrics Qi. We restrict our attention to the region 
[0, ^f). For this we set 



Q, n [0,/i/) = {xe [o,Ai/) I q,[x) = 0}. 



We start by showing that for every a; e the gradient q[{x) in x at Mi does not vanish. 
As q[{x) is perpendicular to the tangent plane in x at M^, this means that the normal of the 
tangent plane is determined by q[(x) (up to orientation). See Figure 6 for an example. This will 
later allow us to apply the implicit function theorem. 




0,1 0,2 0,3 0,4 0,5 



Fig. 6. The normals (scaled down) of the quadrics from Example 8.1. 



Lemma 8.2. For every quadric qi induced by a clean and feasible scSPP f we have 

q[{x) ^ {dxiqi{x),dx2qt{x), . . .,dx„qi{xj) ^ and dx,qi{x) < Va; £ [0,nf). 

In the following, for i G {1, . . . , n} we write x_i for the vector (xi, . . . , Xi^i, Xi^i, . . . , a;„) 
and define {x_i,Xi) to also denote the original vector x. 

We next show that there exists a complete parametrization of "the lower part" of Mi. With 
"lower part" we refer to the set 

S, {x £ Mi I Vy G M, : {x^, = y_,) ^ x^ < y^} , 

i.e., the points x G Mi such that there is no point y with the same non-i-components but smaller 
i-component. Taking a look at Figure 5, the surfaces Si and 5*2 are those parts of Mi, resp. M2, 
which delimit that part of M>q shown in Figure 5 (b) . 
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If a; G Si then Xi is the least non-negative root of the (at most) quadratic polynomial 
qi(Xi, X-i). As we will see, these roots can also be represented by the following functions: 

Definition 8.3. For a clean and feasible scSPP f we define for fc G N the polynomial li^^'^ 
by 

The function hi{X) is then defined pointwise by 

hi{x^i) := lim hf\x^i) 

k^OQ 

for all x^i e [0,/i/_J. 

We show in the appendix (see Proposition B.l) that the function hi is well-defined and exists. 
We therefore can parameterize the surface Si w.r.t. the remaining variables X^i, i.e., hi is the 
"height" of the surface Si above the "ground" Xi = 0. 

By the preceding proposition the map 

Pi ■ [Oj/^/^j) [0,1-l-f] ■ X-i {Xi,. . . ,Xi-i,hi{x_,),Xi+i,. . . ,Xn) 

gives us a pointwise parametrization of Si . Wc want to show that is continuously difFerentiable. 
For this it suffices to show that hi is continuously diffcrcntiable which follows easily from the 
implicit function theorem (see e.g. [OR70]). 

Lemma 8.4. hi is continuously differ entiable with 

dxjhi{x^i) = — = for X e St and j ^ i. 

In particular, dxjhi is monotonically increasing with x. 
Corollary 8.5. The map 

Pi ■ [0,m/-J [0,M/] : {xi, . . . ,Xi_i,hi{x_,),Xi+i, . . . ,Xr,) 

is continuously diffcrcntiable and a local parametrization of the manifold Si . 

Example 8.6. For the SPP / defined in Example 8.1 we can simply solve qx{X, Y) for X leading 
to 

h,{Y) = l-^\{l-Y^). 

The important point is that by the previous result we know that this function has to be defined 
on [0,/.t/2], and diffcrcntiable on [0,/i/2). Similarly, wc get 



[X) = 2--X - -^/X^ - 12X-M2. 



h2,--, - 

' ' 2 2 

Figure 5 (b) conveys the impression that the surfaces Si are convex w.r.t. the parameteri- 
zations p^. As we have seen, the functions hi are monotonically increasing. Thus, in the case 
of two dimensions the functions hi even have to be strictly monotonically increasing (as / is 
strongly-connected), so that the surfaces 5*^ are indeed convex. (Recall that a surface S is convex 
in a point a; G S* if 5 is located completely on one side of the tangent plane at S* in a;.) But in 
the case of more than two variables this no longer needs to hold. 
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Fig. 7. (a) The hyperbolic paraboloid defined by Z = + |XF + ^Y^ + j for X,Y,Z e [-10, 10]. 
(b) A visualization of an SPP consisting of three copies of the quadric of (a) with fj,f = |, ^) the 
upper apex, (c) One of the three quadrics of (b) over [0,^/]. Clearly, even limited to this range the 
surface is not convex. 



Example 8. 7. The equation 

is an admissible part of any SPP. It defines the hyperbolic paraboloid depicted in Figure 7 which 
is clearly not convex. 

Still, as shown in Lemma 2.3 it holds for all < a; < y that 

x + f{x) y < f{x + y). 

It now follows (see the following lemma) that the surfaces Si have the property that for every 
X S [0, fif) the "relevant" part of Si for determining ^f, i.e. Si fl [x, fif], is located on the same 
side of the tangent plane at Si in x (see Figure 8). 




Fig. 8. The graphic shows the quadric defined by gi = with the tangent and normal in x at Si. 
Every point y of Si above x is located on the same side of the the tangent. More precisely, we have 
Vgi|^ ■{y-x)<0. 
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Lemma 8.8. For all x G Si we have 

yyeS^n[x,fif]:q'^{x)-{y-x)<0. 

In particular 

Vy G S'i n [x, fif] ■■ yi> x^ + ^ dxjhi{x_i) ■ {yj - Xj). 
Consider now the set 

n 

i? := f|{a; G [0, fif) \ x, < h,{x.,)}, 
1=1 

i.e., the region of [0,/.*/) dehmited by the coordinate axes and the surfaces Si. Note that the 
gradient q'iix) for x ^ Si points from Si into R (see Figure 6). 

Proposition 8.9. It holds 

X e R-^ X e[0, i^if) A q{x) > 0. 

From this last resuh it now easily follows that R is indeed the region of [0, fif) where all 
Newton and Kleene steps are located in. 

Theorem 8.10. Let f be a clean and feasible scSPP. All Newton and Kleene steps starting 
from lie within R, i.e. 

z/W,K;WGi? (ViGN). 

Proof. For an scSPP we have i<i'^'^\u^^^ G [0,/i/) for all i. Further, k^*' < = /('*^*') and 

< /(z/(')) holds for aU i, too. □ 

In the rest of this section we will use the results regarding R and the surfaces Si for inter- 
preting Newton's method geometrically and for obtaining a generalization of Newton's method. 

The preceding results suggest another way of determining /i/ (see Figure 9): Let x be some 
point inside of R. We may move from x onto one of the surface Si by going upward along the line 
x + t-Bi which gives us the point Pj(a;_,:) = {x^i, hi{x^i)). As x E R, we have a;, Pj(a;_j) < yu/. 
Consider now the tangent plane 

= {y G M" I q'Mx^,)) ■ {y-p,{x^,)) = 0} 

at Si in p^{x^i). Recall that by Lemma 8.8 we have 

Vy e S, n [p,{x^i),fif) q',{p,{x^{)) ■ (y -p,(a;_0) < 0, 

i.e., the part of Si relevant for determining /i/ is located completely below (w.r.t. q[{{)p^{x^i)y) 
this tangent plane. By continuity this also has to hold for y = fif . Hence, when taking the 
intersection of all the tangent planes Ti to T„ this gives us again a point T{x) inside of R. That 
this point T{x) exists and is uniquely determined is shown in the following lemma. 

Lemma 8.11. Let f he a clean and feasible scSPP. Let a;(^\ . . . , a;^"^ G [0,///). Then the 
matrix 

/q[{xW)\ 

is regular, i.e., the vectors {g^(a;''^) | i = 1, . . . , n} are linearly independent. 
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0,4 
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X 



Fig. 9. Given a point x inside of R the intersection of the tangents at the quadrics in the points p\(xi)^ 
resp. pi{x\) is also located inside of i?, yielding a better approximation of \if . 

By this lemma the normals at the quadrics in the points p^ix^i) for x G [0,/i/) are linearly 
independent. Thus, there exists a unique point of intersection of tangent planes at the quadrics in 
these points. Of course, in general the values hi{x_i) can be irrational. The following definition 
takes this in account by only requiring that underapproximations rji of hi{x_i) are known. 

Definition 8.12. Let x <E R. For i = l,...,n fix some rji G [xi, hi{x^i)], and set t] — 
(771, . . . ,r]n)- then let 7^(a;) denote the solution of 



We drop the subscript and simply write T in the case of rji = hi{x^i) for i — 1, . . . ,n. 
Note that the operator is the Newton operator TV. 

Theorem 8.13. Let f be a clean and feasible scSPP. Let x ^ R. For i — fix some 

rji £ [a;,;, /ii(a;_,;)], and set r) = (rji, . . . ,rin). We then have 



Further, the operator T is monotone on R, i.e., for any y ^ R with x < y it holds that 



By Theorem 8.13, replacing the Newton operator TV by T gives a variant of Newton's method 
which converges at least as fast. 

We do not know whether this variant is substantially faster. See Figure 10 for a geometrical 
interpretation of both methods. 

9 Conclusions 

We have studied the convergence order and convergence rate of Newton's method for fixed-point 
equations of systems of positive polynomials (SPP equations). These equations appear naturally 
in the analysis of several stochastic computational models that have been intensely studied in 
recent years, and they also play a central role in the theory of stochastic branching processes. 



- {x-t,rj^)) 



q, ((a;_,,?7,;)) (i = 1, . . . , n). 



X < Mix) < Trjix) <r{x)< 1^1 f 



T{x)<T{y). 
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(a) (b) 




i ^ 1 ' 1 ' 'i • 1 ' 1 Oi • 1 • 1 ' h • 1 • 1 

0,1 0,2 0,3 0,4 0,5 0,1 0,2 0,3 0,4 0,5 

X X 



(c) (d) 

Fig. 10. Geometrical interpretation of Newton's metiiod: (a) Given a point x £ R Newton's metliod first 
considers the "enlarged" quadrics defined by qi{X) — qi{x) (drawn dashed and dotted) which contain 
the current approximation x. (b) Then the tangents in x at these enlarged quadrics are computed 
(drawn dotted), i.e., qi{x) ■ {X — x) — 0. (c) Finally, these tangents are corrected by moving them 
towards the actual quadrics, i.e. qi{x) ■ {X — x) — —qi{x). The intersection of these corrected tangents 
gives the next Newton approximation, (d) A comparison between 7V(a;) and T{x): M{x), resp. T{x) is 
given by the intersection of the dotted, resp. dashed lines. Clearly, we have N[x) < T(x). 
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The restriction to positive coefBcients leads to strong results. For arbitrary polynomial equa- 
tions Newton's method may not converge or converge only locally, i.e., when started at a point 
sufficiently close to the solution. We have extended a result by Etessami and Yannakakis [EY09] , 
and shown that for SPP equations the method always converges starting at 0. Moreover, we 
have proved that the method has at least linear convergence order, and have determined the 
asymptotic convergence rate. To the best of our knowledge, this is the first time that a lower 
bound on the convergence order is proved for a significant class of equations with a trivial 
membership test.^ Finally, in the case of strongly connected SPPs we have also obtained upper 
bounds on the threshold, i.e., the number of iterations necessary to reach the "steady state" in 
which valid bits arc computed at the asymptotic rate. These results lead to practical tests for 
checking whether the least fixed point of a strongly connected SPP exceeds a given bound. 

It is worth mentioning that in a recent paper we study the behavior of Newton's method 
when arithmetic operations only have a fixed accuracy [EGKIO]. We develop an algorithm for a 
relevant class of SPPs that computes iterations of Newton's method increasing the accuracy on 
demand. A simple test applied after each iteration decides if the round-off errors have become 
too large, in which case the accuracy is increased. 

There are still at least two important open questions. The first one is, can one provide a 
bound on the threshold valid for arbitrary SPPs, and not only for strongly connected ones? 
Since SPPs cannot be solved exactly in general, we cannot first compute the exact solution 
for the bottom SCCs, insert it in the SCCs above them, and iterate. We can only compute an 
approximation, and we are not currently able to bound the propagation of the error. For the 
second question, say that Newton's method is polynomial for a class of SPP equations if there 
is a polynomial p{x, y, z) such that for every fc > and for every system in the class with n 
equations and coefficients of size to, the p{n, to, fc)-th Newton approximant has k 

valid bits. We have proved in Theorem 5.12 that Newton's method is polynomial for strongly 
connected SPPs / satisfying /(O) >- 0; for this class one can take p{n,m,k) = 7mn + k. We 
have also exhibited in § 7 a class for which computing the first bit of the least solution takes 
2" iterations. The members of this class, however, are not strongly connected, and this is the 
fact we have exploited to construct them. So the following question remains open: Is Newton's 
method polynomial for strongly connected SPPs? 

Acknowledgments. We thank Kousha Etessami for several illuminating discussions, and 
two anonymous referees for helpful suggestions. 

A Proof of Lemma 6.3 

The proof of Lemma 6.3 is by a sequence of lemmata. The proof of Lemma A.l and, consequently, 
the proof of Lemma 6.3 are non-constructive in the sense that we cannot give a particular C/. 
Therefore, we often use the equivalence of norms, disregard the constants that link them, and 
state the results in terms of an arbitrary norm. 

The following two Lemmata A.l and A. 2 provide a lower bound on ||/(a;) — x\\ for an 
"almost-fixed-point" x. 

Lemma A.l. Let f be a quadratic, clean and feasible SPP without linear terms, i.e., f{X) = 
B(X, X) + c where B is a bilinear map, and c is a constant vector. Let f{X) be non-constant 

* Notice the contrast with the classical result stating that if (Id — /'(/x/)) is non-singular, then Newton's 
method has exponential convergence order; here the membership test is highly non-trivial, and, for 
what we know, as hard as computing nf itself. 
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in every component. Let Rij S = {1, . . . ,n} with 5^0. Let every component depend on every 
S-component and not on any R-component. Then there is a constant Cf > such that 

\\f{f^f-S)-{^Lf-d)\\>Cf\\Sf 

for all S with < S < fif. 

Proof. With the given component dependencies we can write f{X) as follows: 



fiX) 



fsiX) 



Br{Xs,Xs) + Cr 

BsiXs,Xs) + cs 



A straightforward calculation shows 

e{S) := fipf -d)- (/i/ -S) = (Id - f'{pf))S + B{S, S) 

Furthermore, dx^f is constant zero in all entries, so 

efl((5) = 6r - dxsfnil^f) ■ + Bb{6s,8s) and 
es{S) =6s- dxsfsil^f) ■ + Bs{Ss, Ss) ■ 

Notice that for every real number r > we have 



\e{S)\ 



> 



0<<5<M/.||<5||>r ||^[ 

because otherwise /i/ d < fif would be a fixed point of /. We have to show: 

MS)\ 



inf 

0<S<f^f,\\S\\>0 



> 



Assume, for a contradiction, that this infimum equals zero. Then there exists a sequence {d^^')i£iq 



with < < m/. 



and 



> such that limi_yoo 

si') 



and lim; 



||e(g'")| 
l!<5W| 



0. Define 



||^(,) || . Notice that d^'' e {d e ii^>o 

compact. So some subsequence of (d^*-')jgrj, say w.l.o.g. the sequence (d'-'-')igN itself, converges 
to some vector d* E D. By our assumption we have 



= 1} =: D where D is 



e(5«) / 



As B{S^,d^'''^) is bounded, -iy (Id - must be bounded, too. Since r^^' converges to 



-iy(Id-/'(Ai/))d«+S(d«,d«) 



(15) 



0, 



[Id- f{^Jif))d 



(i) 



must converge to 0, so 

(Id-/'(A./))d* =0 



0. So we have d*g > 0, because 



In particular, ((Id - fyf))d*)^ = d^ - dxsfnil^fl ■ 4 
d*g — would imply d*R — which would contradict d* > 0. 

In the remainder of the proof we focus on fg. Define the scSPP g{Xs) ■= fs{X). Notice 
that HQ = /i/5. We can apply Lemma 5.3 to g and d*g and obtain dg >- 0. As fg{X) is non- 
constant we get Bs{d*g,d*g) >- 0. By (15), ^{Id — g' {fj,g))dg^ converges to —Bs{d*g,d*g) -< 0. 
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So there is a j G N such that (Id — g'{pLg))dfp -< 0. Let S := rd^^^ for some sniall enough r > 
such that < ^5 < fig and 

es{S) = (Id - g'[fig))8s + Bs{8s, 8s) 

= r(Id - g'{fig))df + T^Bs{df,df) -< . 

So we have g{fj.g — 6s) -< fig — ^5- However, fig is the least point x with ^(a;) < x. Thus we 
get the desired contradiction. □ 

Lemma A. 2. Let f be a quadratic, clean and feasible scSPP. Then there is a constant Cf > 
such that 

|2 



\f{fif-S)-ifif-6)\\>Cf\\5\ 



for all 5 with < S < fif . 



Proof. Write f{X) = B{X,X) + LX + c for a bihnear map B, a matrix L and a constant 
vector c. By Theorem 4.1.2. the matrix L* = (Id — L)^^ = (Id — /'(O))^^ exists. Define the 
SPP f{X) :— L* B{X , X) + L*c. A straightforward calculation shows that the sets of fixed 
points of / and / coincide and that 

fifif -S)- ifif -6) = (Id - L) (Jifif -S)- {fif - S)) . 

Further, if cr„ (Id — L) denotes the smallest singular value of Id — L, we have by basic facts about 
singular values (see [HJ91], Chapter 3) that 



(Id - L) [fifif -8)- ifif -S)j ^ > a„(Id - L) fifif -S)- ifif - S) 
Note that (Tn(Id — L) > because Id — L is invertible. So it suffices to show that 



fifif -S)- ifif - S) 



>Cf 



If f{X) is linear (i.e. B{X,X) = 0) then f{X) is constant and we have 
fifif — S) — (fif — 5) = \\S\\, so we are done in that case. Hence we can assume that some 

component of B(X,X) is not the zero polynomial. It remains to argue that / satisfies the 
preconditions of Lemma A.l. By definition, / does not have linear terms. Define 

S :~ {i \ I < i < n, Xi is contained in a component of B{X , X)} . 

Notice that S is non-empty. Let iQ,ii, . . . , im, im+i {m > 0) be any sequence such that, in /, for 
all j with < J < m the component ij depends directly on ij+i via a linear term and im depends 
directly on im+i via a quadratic term. Then depends directly on z„i+i via a quadratic term 
in L"^B(X, X) and hence also in /. So all components are non-constant and depend (directly 
or indirectly) on every S'-component. Furthermore, no component depends on a component that 
is not in 5, because L*B{X , X) contains only S'-componcnts. Thus, Lemma A.l can be applied, 
and the statement follows. □ 



The following lemma gives a bound on the propagation error for the case that / has a single 
top sec. 
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Lemma A. 3. Let f be a quadratic, clean and feasible SPP. Let S C {1, ... ,71} be the single 
top sec of f . Let L :~ {1, . . . , 71} \ 5. Then there is a constant Cf > such that 



\\tlfs~J^s\\<CfWhfL-XL\\ 

for all XL with < xl < l^-f l where jig :— fi {f g[XL/xL]). 

Proof. We write /^(X) = f g{X s, X l) in the following. 

If 5 is a trivial SCC then fifg ~ fsi^' m/l) ^-nd Jig = /s(0, xl). In this case we have with 
Taylor's theorem (cf. Lemma 2.3) 



Wl^fs 



fJ-sl 



< 
< 

< 



\fsiO,tlfL)-fsiO,XL)\\ 

|9x/5(o,a^/l)||-||m/l-^l|| 

\dxfsiO,t^fL)\\ WMl-^lW-VMl-xlI 



9x/s(0,A*/l)|| • V¥fd- VWf^fL-^d 



and the statement follows by setting Cf := ||0x/s(0i ' VTm/lIT- 

Hence, in the following we can assume that 5 is a non-trivial SCC. Set g{Xs) 
f g{X s, l^f l)- Notice that g is an scSPP with fig = fifg. By applying Lemma A. 2 to g and 
setting c := 1/ y^C^ (the Cg from Lemma A. 2) we get 

Wt^fs - Msll < c ■ Vhifj-g - (m/s - Jj-s)) - iy-a - ifj-fs - Ms))ll 



= c - VII/s(Ms.A«/l) - fsiPs^XL)\\ 
and with Taylor's theorem (cf. Lemma 2.3) 

< c Vl|5xi,/s(Ats.M/L)(/^/i -aJi)!! 



So the statement follows by setting Cf := c • ^/H^Xi/sl/^/s, A'/l) 

Now we can extend Lemma A. 3 to Lemma 6.3, restated here. 
Lemma 6.3. There is a constant Cf > such that 



- M[t] <Cf ■ \j l^f[>t] - P[>t] 

holds for all p[>t] with < < Ai/[>t], where /I^tj = /i(/[t] [[>i]/P[>t]]) ■ 

Proof. Observe that fJ-f[t], fJ-f[>t] ^^'^ P[>t] '^^ ^^^^ depend on the components of depth < t. 
So we can assume w.l.o.g. that i = 0. Let SCC{Q) = {Si, . . . , 5^}. 

For any Si from 5CC(0), let be obtained from / by removing all top SCCs except for Si. 
Lemma A. 2 applied to /^*^ guarantees a C''^ such that 



Ms, 



/^/[>0] ^ P[>0] 
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holds for all P[>o] with < P[>o] 5; m/[>o]- Using the equivalence of norms let w.l.o.g. the norm 
ll'll be the maximum-norm Let Cf maxi<i<fcC^*^. Then we have 



m/[o] - /^[o] = max - /^s, \\<Cf ^ fif[yo] - P[>o] 

for all p[>o] with < < m/[>o]- 

B Proofs of § 8 
B.l Proof of Lemma 8.2 



□ 



Lemma 8.2. For every quadric qi induced by a clean and feasible scSPP f we have 

q[{x) = {dxiqi{x),dx2qi{x), . ■ ■ ,dx^qi{x)) ^ and dx,qi{x) < Va; G [0,^j/). 

Proof. As shown by Etessami and Yannakakis in [EY09] under the above preconditions it holds 
for all X G [0, fif) that (id — /'(a;)) is invertible with 

{Id- fix))-' ^ fix)*. 

Thus, we have 

q'{x)-^ = {f{x)-ldy' = -{f'ix)*), 

implying that q'i{x) ^ for all x G [0,/i/) as q'{x) has to have full rank n in order for q'{x)^^ 
to exist. Furthermore, it follows that all entries of q'{x)^^ are non-positive as f'{x)* is non- 
negative. Now, as qi{X) = fi{X) — Xi and fi{X) is a polynomial with non-negative coefficients, 
it holds that 

1t{x) ■ Sj = dx^qt{x) = dxjt{x) > 
for all j i and x > 0. With every entry of q'{x)^^ non-positive, and 



q[{x)-q'{x)-' = e] 



we conclude dxiqi{x) < 0. 



□ 



B.2 Proof of Lemma 8.4 



We first summarize some properties of the functions hf. 
Proposition B.l. Let f be a clean and feasible scSPP. Let a;,y G [0,/i/] with x <y. 

(a) 0</if' <A^/.- 

(b) hf\x^,) < hf+'\x^,) for all fc G N. 

(c) hf\x^i) < hf\y_^) for all fc G N. 

(d) hi(x-i) < nf^, and hi is a map from [0,/x/_j] to [0,/i/j]. 

If fi depends on at least one other variable except Xi, we also have /ii([0, C [0,/i/j). 

(e) K{x^i) < h,{y_.j). 

(f) fi{x^i,hi{x_i)) ^ hi{x^i). 

(g) For Xi — fi{x) we have hi{x-i) < Xi. 
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(h) /l,(/i/_,) = M/.. 

Proof. Let < a; < y < fif. Using the monotonicity of fi over M"q wc proceed by induction on 
k. 



(a) For fc = we have 
We then get 



(b) For fc = we have 



Thus 



h\"\x_,) = MQ,x_,) < U{hf\x_i),x_,) = hf\x_,). 



hf^'\x^,) = f,{hf\x.,),x.,) < J,{hf+'\x.,),x.{) = hf+^\x.,) 



fohows. 

(c) As a; < y, we have for A: = 

hf\x^.,) = MO,x^,) < MO,y_,) = hf\y^,). 

Hence, we get 

hf+'\x^.,) = h{hf\x.,),x.,) < h{hf\y_^),y_;) = hf+'\y_.^). 

(d) As the sequence {h^^^ (a;_i))fcgN is monotonicaUy increasing and bounded from above by ^f^, 
the sequence converges. Thus, for every x the value 

h,{x^.i) = hm hf\x^i) 

k^oc 

is weU-defined, i.e., hi is a map from [0,/i/_j] to [0,/i/i]. 

If fi depends on at least one other variable except Xi, then hi is a non-constant power series 
in this variable with non-negative coefficients. For x^i G [0,/i/_j) wc thus always have 

h,{x_,) < h.i{fif_^) = i^if, 

as x^i -< fJ.f_i. 

(e) This follows immediately from (b). 

(f) As fi is continuous, we have 

f,{h,(x^i),x^i) = /j( lim hf\x^i),x^i) = lim hf^^\x^i) = hi{x^,), 

where the last equality holds because of (b). 

(g) Using induction similar to (a) replacing nf by x, one gets hf\x-i) < Xi for all fc G N as 
fi{x^i) = Xi. Thus, hi{x^i) < Xi follows similarly to (d). 

(h) By definition, we have /.t/ = limfe_j.oo /'^(O). For fc = 0, we have 

(/°(O)).=0</,(0,M/_J = /^f 

We thus get by induction 

(/('=+i)(o)), ^ f,ifm < f,ih<t\^,f_,),^,f_,) hf+'\^if_,). 

Thus, we may conclude /i/j < hi{^f_^). As /^/,; = fi{nf), we get by virtue of (g) that 
hi{iif_^) < fJ.f„ too. 
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□ 



With Proposition B.l at hand, wc now can show Lemma 8.4: 
Lemma 8.4. hi is continuously differentiable with 

dx fi{x) dx qi{x) 

dxiht{x^i) = —-i — = — ^ -— for X e Si and j ^ i. 

-Ox,qi\x) -OXiqi[x) 

In particular, dxjhi is monotonically increasing with x. 

Proof. By Lemma 8.2 the imphcit function theorem is apphcable for every x ^ Si. We therefore 
find for every a; G 5; a local parametrization h^ '. U ^ V with hx{x-i) = Xi. Thus h^ix^i) is the 
least non-negative solution of qi{Xi, x_i) = 0. By continuity of qi it is now easily shown that for 
V-i & U it has to hold that hx{y_i) is also the least non- negative solution of qi{Xi, y_i) = 
(sec below). By uniqueness wc therefore have h^ ~ hi and that hi is continuously differentiable 
for all x^i G [0,fif_i). 

For every a;_j G [0,/i/_j) we can solve the (at most) quadratic equation qi{Xi,x^i) — 0. 
Wc already know that hi{x^i) is the least non-negative solution of this equation. So, if there 
exists another solution, it has to be real, too. 

Assume first that this equation has two distinct solutions for some fixed x^i G [0,/i/_j). 
Solving qi{Xi, x^i) = thus leads to an expression of the form 



-b{x_i) ± ^b{x^,)^ - 4a ■ c(x_,) 
2a 

for the solutions where b, c are (at most) quadratic polynomials in X-i, c having non-negative 
coefficients, and a is a positive constant (leading coefficient of Xf in qi{X)). As b and c are 
continuous, the discriminant 6(-)^ — 4a-c(-) stays positive for some open ball around x^i included 
inside of U (it is positive in X-i as we assume that we have two distinct solutions). By making U 
smaller, we may assume that U is this open ball. One of the two solutions must then be the least 
nonnegative solution. As h^ is the least non- negative solution for x^i, and h^ is continuous, 
this also has to hold for some open ball centered at x^i. W.l.o.g., U is this ball. So, h^ and hi 
coincide on U. 

We turn to the case that qi{Xi,X-i) = has only a single solution, i.e. hi{x^i). Note 
that qi{X) is linear in Xi if and only if qi{Xi,X-i) is linear in Xi. Obviously, if qi linear in 
Xi, then hi and h^ coincide on U. Thus, consider the case that qi{X) is quadratic in Xi, 
but qi{Xi,X-i) has only a single solution. This means that X-i is a root of the discriminant, 
i.e. b{x^i) — 4:ac{x-i) = 0. As hi{y_i) is a solution of qi{Xi,y_i) = for all y_i G U, the 
discriminant is non-negative on U . If it equal to zero on U , then we again have that hi is equal to 
hx on U . Therefore assume that is positive in some point of U . As the discriminant is continuous, 
the solutions change continuously with X-i. But this implies that for some y_i G U there arc at 
least two yi, y* G V such that (y_j, yi) and {y_i, y*) arc both located on the quadric qi{X) — 0. 
But this contradicts the uniqueness of h^ guaranteed by the implicit function theorem. 

Assume now that x G Si. We then have 

q^{x) = qi{x^i, hi{x^,)) ^ 0, 

or equivalently 

fi{x^i,hi{x^,)) = hiix^i). 
Calculating the gradient of both in x yields 

ft{x)-p',(x_i) = /i-(a;_,). 
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For the Jacobian of we obtain 



( ^1 \ 



This leads to 

dxjt{x) + dxji{x) ■ dxjhi{x^i) = dx^hi{x^,) 
which solved for dxj hi yields 

dxjiix) 



dx,h,{x^.i) = --. 



As dxi(li{x) < and both dxjfi and dxiQi monotonically increase with x, it follows that dxjhi 
also monotonically increases with x. Finally, for j ^ i^e have that dx^qi = dxjfi as qi — fi—Xi. 



□ 



B.3 Proof of Lemma 8.8 



Lemma 8.8. Fo?- all x £ Si we have 

yy e S,r\[x,iif]: q[{x) ■ {y ~ x) <Q. 

In particular 

yy e SiCi [x, fif] ■.yi>x, + Y^ dx^Kix^i) ■ {yj - Xj). 

Proof. Let X Si, i.e. fi{x) = Xi. We want to show that 

q',{x) ■{y-x)<0 
for all y G S'i n [a;, /i/). As /; is quadratic in X, we may write 

= q.iy) 

= -Vi + My) 

= -y^ + Mx) +f[{x) ■{y-x) + {y-xy -A-iy-x) 



=Xi >0 

> -yi + x^ + f'i{x) -{y-x) 
= f[{x) -(y-x)- ej - {y-x) 
= qi{x) -{y-x) 

where A is a symmetric square-matrix with non-negative components such that the quadric 
terms of fi are given by AX . 

The second claim is easily obtained by solving this inequality for yi and recalling that by 

Lemma 8.4 we have dxjh,{x_,) = ^a^Jqlfp^^^x"-'!)) ^^'^ 9x,qi{p,{x^.,)) < 0. □ 
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B.4 Proof of Proposition 8.9 



Proposition 8.9. It holds 

X e R ^ X e [0, nf) A q{x) > 0. 
Proof. Let x ^ R and i G {1, . . . , 7i}. Consider the function 

g{t) := qi{p.j^{x-i) +te,). 

As Qi is a quadratic polynomial in X there exists a symmetric square-matrix A with non-negative 
entries, a vector 6, and a constant c such that 

= X^AX + b^X + c. 

It then follows that 

q,iX + Y) = q,{X) + q[{X)Y + AY . 
With qi{p.i{x^i)) = this implies 

g{t) = q[{p,^{x_,))te, + ejAe, = t ■ {^xMP^{x-^)) + a ■ t) . 

■.=a>0 

As p.i{x^i) < nf (/ is strongly connected and x G [0,^/)), we know that dxiqi{Pi{x-i)) < 0. 
Thus, g{t) has at most two zeros, one at 0, the other for some t* > 0. 

For the direction we only have to show that Xi < hi{x_i) implies that qi{x) > 0. This 
now easily follows as Xi < hi{x^i) implies that there is a t' < with pi{x^i) +tei = x. But for 
this i' < we have qi{x) = g{t') > 0. 

Consider therefore the other direction ('^=), that is a; G [0,/i/) with q{x) > 0. Assume that 
X ^ R, i.e., for at least one i we have Xi > hi{x-i). As qi{x) > there has to be a t" > with 
Pi{x-i) + t"ei = X and g{t") > 0. This implies that a > has to hold as otherwise g{t) would 
be linear in t and negative for t > 0. But then the second root t* of g{t) has to be positive. Set 
X* = p.i{x-i) + t*ei with qi[x*) = 0, too. 

A calculation similar to the one from above leads to 

g{t + t*) = q,{x* + te,) = t ■ {dxMx*) + a ■ t) . 

It follows that dxiqi{x*) has to be greater than zero for — t* to be a root (as a > 0). But we 
have shown that dxiqi{x) < for all x G [0, fif). □ 

B.5 Proof of Lemma 8.11 

Lemma 8.11. Let f be a clean and feasible scSPP. Let x^^\ . . . , x^"-") G [0,/i/). Then the 
matrix 

/q[{xW)\ 

is regular, i.e., the vectors {gj'(a;''^)|i . . ,n} are linearly independent. 
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Proof. Define x G [0,/.*/) by setting 

Xi := ma^{x[^'^ \ j = I, . . . , n}. 

We then have a;^*'' < x for all i, and x -< fif. As mentioned above, we therefore have that q'{x) 
is regular with 

q'{x)-' = -Y,f'{x)''. 

fcGN 



As a;*^*'' < a; it follows that 



Hence, we also have 



E : <E/'(-) 



implying that 



: and, thus, : exist. 

So, the vectors {^'^(a?^^)), . . . , g,'j(a;("))} have to be linearly independent. □ 
B.6 Proof of Theorem 8.13 

Theorem 8.13. Let f he a clean and feasible scSPP. Let x E R. For i = 1, . . . ,ri fix some 
rji G [xi,hi{x^i)\, and set r) ~ (r/i, . . .,rin). We then have 

X < N{x) < %{x) <T{x)< fif 

Further, the operator T is monotone on R, i.e., for any y G R with x < y it holds that 
nx) < T{y). 

Proof Set 

TT.; {x^i,ri^) and h := (/ii(a;_i), . . . , /i„(a;_„)). 
We first show that x < Tr){x): 

= {fii'^i))*i=i,...,n ■ ■ (x + im - ■ e,) + qii'^i)),^j^_^^„ 

>0 in every comp. <° -0 -'^ 

> X. 

Trjix) is by definition the (unique) solution of the equation system defined by 

g-(7r,)(X - TTi) = -qi{-Ki) (i 1, . . . , n). 
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As Tr]{x) > X we can also consider this system with the origin of the coordinate system moved 
into X, i.e. 

ql{-Ki){X + x- Tv.j) = -qi{-Ki) (i = 1, . . . , n). 
We show that this system is equivalent to an SPP. For this, we solve these equations for Xi: 
q[{TTi){X + x - ■n.j) = -qi{-Ki) 
^ = -9i(7r.j) + (7-(7rj) {tv, ~ x)^ 

Again, we have dxiqi{T^i) < < dx^qiiT^i) as tt^ S R, and q'i{T^i) monotonically increases with 
•qi. Hence, the above linear equation for Xi is indeed a polynomial with non- negative coefBcients. 
Denote by the SPP defined by these linear equations. We then have ^/^ = %f{x) — a; as 
the above equation system has Tq{x) — a; > as its unique solution. Further, we know that the 
Kleene sequence (/!^(0)) ^.^^j converges to ^/j,. We show that all coefficients of increase with 
T] ^ h. This is straight-forward for 

-dxiqii-^i) 

as dxiqi(T^i) < < dxjqi{T^i), and all these terms increase with rji — > hi{x^i). Consider therefore 

n ^ qij-^i) , f _ _ qij-^i) - dx,qi{T^i){vi - Xj) 

-Ox,qi{T^i) -OxAii.'^i) 

Wc show that this term increases with rji. Set 5i := i]i~Xi. We can find a non- negative, symmetric 
square-matrix A, a vector 6, and constant c such that 

q,{X) = X'^AX + b^X + c and = 2X'^ A + 6^. 

As TTi = a; + (5^6^ we have 

qi{T^i) = 171(3; + 5,ei) ^ qi{x) + dx,qi{x)Si + SfAu, 

and 



dxiqt{T^i) ■ Si = ql{x + diei)diei = dxiqi{x)6i + 2di A 



This leads to 

qt{-Ki) - dxiqi{T^i)5i qi{x)-SfAi^ 



-dxiqii-^i) -dxiqi{x) - 2SiAii ' 

Taking the derivative w.r.t. Si yields: 

~2AiiSi qi{x)-AiiS^ , A 

_ 2A„ax,gi(cc)5,+4Aj.i5,^+2A..9.(3:)-2Aj.5,^ 

_ r, . Ai,S'^+aXiq,(x)5,+qi{x) 
— {-dx^qi{x)-2AiiSiy 

As qi{'Ki) > and > 0, it follows that 

qii-i^i) 
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increases with r]i — > hi{x^i). Thus, aU coefficients of increase with rji — >■ hi{x^i), and so for 
any rj' G [r], h] it follows that 

fr,{y)<fr,'iy) foraUy >0, 

and 

%{x)-x = 1^1 < fif^, = Ty'ix) - X. 
As M{X) = Tx{X) and T{X) = Th{X) we may therefore conclude that 

J\f{x) < Tr,{x) < Trj'ix) < T{X). 

It remains to show that T{x) < jif. This is equivalent to showing that fiff^ < fif — x. For 
ff^{X) we have by definition and Lemma 8.4 

By virtue of Lemma 8.8 it follows that jif is above all the tangents, i.e. 

fhifJ-f -x) <nf-x. 

By monotonicity of we also have 

A straight-forward induction therefore shows that 

fi{0)<^if-x (VfcelH), 

and, thus, 

T{x) -x = fif^ < nf -X. 

We turn to the monotonicity of T. Let y G i? with x < y. Assume that x and y are located on 
the surface 5*^, i.e. 

hi{x_i) = Xi and hi{y^^) = y^. 

The tangent Ti\x at 5^ in x is spanned by the partial derivatives of Pj in x. The part Ti\xri[x, fif] 
relevant for T{x) can therefore be parameterized by 

X + ^dx,P^{x) ■ {uj - Xj) with e [a;_j,pt/_J. 

Similarly for Ti\y. 

In particular, for G both points on the tangents defined by M_i differ only 

in the ith coordinate being (the remaining coordinates are simply tt_i) 

ty = J/i + X! ^A'j'ii(y) • {uj -Uj), resp. = + ^ dx,hi{x) ■ {uj - Xj). 
By Lemma 8.8 we have 

Vi > a;i + ^ dxjhi{x) ■ (y^ - a;^). 

From Lemma 8.4 it follows that dxjhi{y) > dxjhi{x). Thus ty > immediately follows. 

Now for x,y e R with x < y we can apply this result to the tangents at Si in p^{x-i), resp. 
Pi{y_i), and T{x) < T{y) follows. 
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